Selenoproteins are a special group of proteins containnig an aminoacid called selenocysteine. This aminoacid is encoded by a UGA codon, usually referred as a STOP codon. This dual role of the UGA codon usually leads to a misannotation of selenoproteins in recently sequenced genomes.

Consequently, the aim of this project was to properly annotate each of the selenoproteins of Latimeria chalumnae. For this, we used the selenoproteomes from close-related organisms such as Xenopus tropicalis and Danio rerio. Those organisms were not chosen randomly since we know that coelacanths and lungfish (both within the sarcopteriyans) are the nearest living relatives of tetrapods. That is why we chose Xenopus tropicalis as an amphibian and Danio rerio as an actinopterygian sharing a common ancestor with sarcopterygians before the one with tetrapods. It is worth to mention that selenoproteins that we could not find in one of those two organisms were found in the other one. This enable us to compare a larger number of selenoproteins in Laitmeria chalumnae genome. In some cases we had to use Mus musculus selenoproteins because we didn't obtain an alignment containing the selenocysteine with the other two genomes or because there was no selenocysteine in the protein queries.

We found 25 hypothetical selenoproteins in Latimeria chalumnae genome, some of them grouped in different protein families such as glutathione peroxidase (GPx), iodothyronine deionidase (DI) or thioredoxin reducatse (TR). Glutathione peroxidase proteins play an important role in the physiology of organisms protecting them from oxidative damage. Thus, we found 4 distinct GPx in Latimeria chalumnae: GPx1, GPx2, GPx4a and GPx4b. All of them selenoproteins except GPx4a which is an homolog from Danio rerio with a cysteine instead of a selenocysteine. Iodothyronine deionidases are enzymes implicated in the activation and deactivation of thyroid hormones. We found 5 DI in Danio rerio genome in ncbi database while Latimeria chalumnae was found to have 3, just like Xenopus tropicalis. It was difficult for us to name proteins within a family since we found that depending of the organism one protein can have different names.

During our research, we could not get perfect alignments for all the selenoproteins and because of that, not all found selenoproteins are fully annotated. Some of them lack of the first region and consequently we could not find the first methionine but we still observe an aligned selenocysteine. This could be explained if our protein do not start with a methionine, which is highly unlikely or because there is an intron just before and the program could not find the exon before this intron. Some proteins are whithin this group: Sel15 and Fep15. Other annotated proteins show highly conserved regions where we can see the selenocysteine whereas other regions differ significantly (SelM, SelP1a, SelP1b, SelT). This highly conserved regions with selenocysteine throughout the evolution must be crucial for the protein function. In another hand, other proteins present a very high similarity all over the sequence (GPx family, SelK, DI3) showing the importance of the overall protein structure.

Despite not being able to fully anotate proteins, the research of the Secis element downstream our target sequences helped us to confirm if we had or not a selenoprotein. We found almost every Secis element for all our annotated proteins but for some of them we obtained low COVE scores even though the alignments were good. This low COVE scores may indicate that the region where we were looking for the Secis element was not correctly taken, that there is no Secis element at all or because the Secis element that we are looking for do not follow any pattern that SecisSearch can detect.

This selenoprotein research has been performed manually for each single protein showed in our results. As this process is slow and repetitive, we decided to create a program for the automation of the overall process. We also have to say that despite our efforts and tutors help, we could not run Selenoprofiles and Genewise programs. Therefore, our automation only uses exonerate. The results obtained with our program were equal for both methods but in some cases we had to analyse manually. Our program is fully accessible in the results.

We also provide the protein sequences of the necessary machinery for the synthesis of selenoproteins. All the crucial proteins were found in Latimeria chalumnae genome using the same methodology as we used to find the selenoproteins. Interestingly we could not find in any database Danio rerio Sbp2 and PstK even if those are necessary to synthesize selenoproteins, showing the incomplete annotation of some genomes.

Back to Top