Selenoproteins incorporate Selenium, unlike other metals, in the polypeptide by being part of an amino acid called selenocysteine (Sec). Therefore, they have a great potential as phylogenetic tracers due to its low frequency and its conservation between species.
The aim of the study was to find selenoproteins in the annotated genome of Neotoma lepida. To do this we compared Mus musculus selenoproteins from SelenoDB 2.0. using a bioinformatic approach. We chose Mus musculus because it is a well annotated organism which is phylogenetically close to our specie. However, in some selenoproteins that we could not find a SECIS element we tried comparing our genome with reference proteins from other species including Homo sapiens, as it is the best annotated genome, and Rattus norvegicus, as it is phylogenetically closer.
Our script could automatically predict the proteins. On the one hand, we could find 19 selenoproteins including Sel15, GPx1, GPx2, GPx3, DIO1, DIO2, DIO3, SelenoH, SelenoI, SelenoK, SelenoM, SelenoN, SelenoO, MSRB1, SelenoT, SelenoW, TXNRD1, TXNRD2 and TXNRD3.Mus musculus has 22 selenoproteins so we were expecting to find in our genome around 22 selenoproteins but we could not align SelenoS, SelenoP and SEPHS.
On the other hand, we predicted 9 cysteine-containing homologous including GPx6, GPx7, GPx, MsrA, MSRB2, MSRB3, SelenoU1, SelenoU2 and SelenoU3. All of them were cysteine-containing homologous in Mus musculus too.
Regarding machinery proteins, we could predict SecS, eEFsec, PSTK, SBP2, Secp43 and SEPHS2. This last one, is a selenoprotein in Neotoma lepida as it is in Mus musculus, and the rest are cysteine-containing homologous too.
Our study is not exempt of limitations. The existence of SECIS elements was only used as a way to confirm whether there were selenoproteins or not. However, the reason why in some predicted selenoproteins could not be found, is not clear. A posible explanation could be that the genome is not perfectly annotated and scaffolds are really short to include the gene sequence and the SECIS element. Another explanation could be that SECIS elements might have been lost during evolution in this species.
It is important to consider that, in the performed analysis, the way we decided to choose one scaffold or another could change the results. We tried to reduce possible experimental errors by unifying the method of prediction. Firstly, we prioritize the quality of the alignment by observing the t-coffee score and the alignment of the selenocysteine. Secondly, e-value and identity percentage were checked for comparing hits with similar alignments between them.
This is the reason why in some situations, we had hits with high identity percentages and e-values but badly aligned, so we decided not to choose them and choose those ones with higher scores in their alignments.
In conclusion, this study represents a small contribution to the selenoprotein current knowledge by the description of the selenoproteome in an animal that until now, had never been analyzed.