|Group components: Toni de Dios, Marc Gordo, Anna Llopart, Xavier Martí|
|Abstract||Introduction||Materials & Methods||Results||Discussion||Conclusions||References||Acknowledgments|
Materials and Methods
One of the goals in this work is to computationally identify all selenoproteins in Gavialis gangeticus genome. In order to complete our objectives we use an homology based technique. We have used as queries, sequences of known selenoproteins and searched regions of the genome in which crocodile has homologues.
Obtaining Gavialis gangeticus genome
The genome was facilitated by the university and can be found at the directory:
As mentioned in the introduction, we used as preferred reference genome G. Gallus. In SelenoDB we can find all the selenoproteins that have been noted for that genome including the gene sequence, promoter sequence, transcript sequence, protein sequence... We have used protein sequences of the different selenoproteins to search for homologues in our genome. In case we could not find some of the selenoproteins in G. Gallus genome we would recurre to the human genome because it is the best studied to date.
Search regions of the genome containing selenoproteins: BLAST
BLAST (Basic Local Alignment Search Tool) is a program that uses an heuristic algorithm and compares sequences of interest with other known sequences through alignment so that we can find similar regions. Furthermore, it calculates the statistical significance of the alignments it finds.
Depending on the type of blast, we could compare aminoacids or nucleotides but in this project we have used tblastn which compares and aligns aminoacid and nucleotide sequences. More specifically we have compared protein sequences from known selenoproteins (chicken genome) with nucleotide sequences from the Gavialis gangeticus genome.
For each alignment we obtained multiple parameters:
Scaffold: location of the alignment in crocodile’s genome
Identity: how similar are the aligned sequences provided with a score
Start and End position: start and end position of the contained hit
E-value: describes how many times you can expect to find an alignment as good as that one in the database by chance
The command used is:
$blastall –p tblastn –i query.fa –d databaseBLAST –o outputfile
-p: specifies the type of blast that we want to use
-i: specifies the input file. In our project we use protein sequences from the chicken genome as queries
-d: specifies the database that will be used for the search. In our project we use the G. gangeticus genome provided by the university
In this project we use as threshold E-value < 0,0001
Sequence of interest: FASTAINDEX, FASTAFETCH, FASTASUBSEQ
In order to extract the sequence found in the genome region we need different programs:
We focus on the scaffold with the best alignment provided by BLAST. To do this, we must execute fastaindex separating the gavial genome segments and indexing them.
$ fastaindex genome.fa genome.index
The first argument, genome.fa, is the input
The second argument, genome.index, is the output