Discussion autour des formats de fichiers.
Les standards à utiliser, les bibliothèques, les bonnes pratiques.

Bioinfo standard : GFF File

You are currently viewing a revision titled "Bioinfo standard : GFF File", saved on 19 January 2017 at 16 h 20 min by Sébastien Letort
Title
Bioinfo standard : GFF File
Content

GFF : Generic Feature Format Version 3

Specifications

https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

Samples

S1

gff-version 3.2.1
 ##sequence-region ctg123 1 1497228
 ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
 ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001
 ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1
 ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2
 ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3
 ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003
 ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002

S2

ctg123 . cDNA_match 1050 9000 6.2e-45 + . ID=match00001;Target=cdna0123 12 2964;Gap=M451 D3499 M501 D1499 M2001

Recommended library

?

Comments

Element length

In S1, in element mRNA00001, what is the length of the element ? does the base 9000 part of the element ? The specification is not that clear on this. But other samples found in the specification make the answer to be : L = end - start + 1, the base 'end' is part of the element.

Argument

In S2, target length is the sum of gap M bases : M451 + M501 + M2001 = 2953 bases = 2964 - 12 +1
Target length = target end - target start + 1
In S2, ref length is the sum of gap M and D bases : M451 + D3499 + M501 + D1499 + M2001 = 7951 bases = 9000 - 1050 + 1
Ref length = ref end - ref start + 1
Excerpt
Footnotes


Old New Date Created Author Actions
19 January 2017 at 15 h 23 min Sébastien Letort
19 January 2017 at 15 h 21 min Sébastien Letort
19 January 2017 at 15 h 20 min Sébastien Letort