Discussion autour des formats de fichiers.
Les standards à utiliser, les bibliothèques, les bonnes pratiques.
Bioinfo standard : GFF File
You are currently viewing a revision titled "Bioinfo standard : GFF File", saved on 19 January 2017 at 16 h 20 min by Sébastien Letort | |
---|---|
Title | Bioinfo standard : GFF File |
Content | GFF : Generic Feature Format Version 3Specificationshttps://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.mdSamplesS1gff-version 3.2.1 ##sequence-region ctg123 1 1497228 ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1 ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2 ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3 ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002 S2ctg123 . cDNA_match 1050 9000 6.2e-45 + . ID=match00001;Target=cdna0123 12 2964;Gap=M451 D3499 M501 D1499 M2001 Recommended library?CommentsElement lengthIn S1, in element mRNA00001, what is the length of the element ? does the base 9000 part of the element ? The specification is not that clear on this. But other samples found in the specification make the answer to be : L = end - start + 1, the base 'end' is part of the element.ArgumentIn S2, target length is the sum of gap M bases : M451 + M501 + M2001 = 2953 bases = 2964 - 12 +1Target length = target end - target start + 1In S2, ref length is the sum of gap M and D bases : M451 + D3499 + M501 + D1499 + M2001 = 7951 bases = 9000 - 1050 + 1 Ref length = ref end - ref start + 1 |
Excerpt | |
Footnotes |