Discussion autour des formats de fichiers.
Les standards à utiliser, les bibliothèques, les bonnes pratiques.

Bioinfo standard : AGP File

image_print

AGP : A Golden Path version 2.0

Specifications

https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/

Samples

S1

##agp-version 2.0
# ORGANISM: Homo sapiens
# TAX_ID: 9606
# ASSEMBLY NAME: EG1
# ASSEMBLY DATE: 09-November-2011
# GENOME CENTER: NCBI
# DESCRIPTION: Example AGP specifying the assembly of scaffolds from WGS contigs
EG1_scaffold1 1 3043 1 W AADB02037551.1 1 3043 +
EG1_scaffold2 1 40448 1 W AADB02037552.1 1 40448 +
EG1_scaffold2 40449 40661 2 N 213 scaffold yes paired-ends
EG1_scaffold2 40662 117642 3 W AADB02037553.1 1 76981 +
EG1_scaffold2 117643 117718 4 N 76 scaffold yes paired-ends
EG1_scaffold2 117719 145387 5 W AADB02037554.1 1 27669 +

Recommended library

?

Comments

  • To load data I used a personal class RegionLink that aims to manage file like “chrom,start,end <=> chrom’,start’,end'”
    It can be found in https://gitlab.inria.fr/sletort/Brassica_scripts/blob/master/lib/RegionLink.py