Discussion autour des formats de fichiers.
Les standards à utiliser, les bibliothèques, les bonnes pratiques.
Bioinfo standard : AGP File
AGP : A Golden Path version 2.0
Specifications
https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/
Samples
S1
##agp-version 2.0 # ORGANISM: Homo sapiens # TAX_ID: 9606 # ASSEMBLY NAME: EG1 # ASSEMBLY DATE: 09-November-2011 # GENOME CENTER: NCBI # DESCRIPTION: Example AGP specifying the assembly of scaffolds from WGS contigs EG1_scaffold1 1 3043 1 W AADB02037551.1 1 3043 + EG1_scaffold2 1 40448 1 W AADB02037552.1 1 40448 + EG1_scaffold2 40449 40661 2 N 213 scaffold yes paired-ends EG1_scaffold2 40662 117642 3 W AADB02037553.1 1 76981 + EG1_scaffold2 117643 117718 4 N 76 scaffold yes paired-ends EG1_scaffold2 117719 145387 5 W AADB02037554.1 1 27669 +
Recommended library
?
Comments
- To load data I used a personal class RegionLink that aims to manage file like “chrom,start,end <=> chrom’,start’,end'”
It can be found in https://gitlab.inria.fr/sletort/Brassica_scripts/blob/master/lib/RegionLink.py