Previous seminars – 2019

(archives from old symbiose site)

  • ModĂ©lisation de la rĂ©gulation hormonale de la prise alimentaire et du poids corporel
    Marine Jacquier (IGDR)
    Thursday, December 12, 2019 – 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    La modĂ©lisation de la dynamique de poids corporel est utilisĂ©e notamment dans le cadre de l’Ă©valuation de traitements, tels que la restriction calorique ou l’utilisation de mĂ©dicaments anti-obĂ©sitĂ©. En temps normal, la prise alimentaire, la dĂ©pense Ă©nergĂ©tique et le poids corporel sont rĂ©gulĂ©s, en particulier par des hormones, afin de limiter des changements importants de poids corporels. Je vais prĂ©senter deux modèles basĂ©s sur des Ă©quations diffĂ©rentielles ordinaires et Ă  retard, dĂ©crivant la dynamique de prise alimentaire, poids corporel et dĂ©pense Ă©nergĂ©tique, en fonction de taux de diffĂ©rentes hormones. Les rĂ©sultats de ces modèles sont comparĂ©s Ă  des donnĂ©es expĂ©rimentales chez le rat, et permettent de reproduire et prĂ©dire l’Ă©volution du poids notamment en rĂ©ponse Ă  des modifications de l’alimentation. Je montrerai Ă©galement que des perturbations de la prise alimentaire ou des taux d’hormones peuvent entrainer une rĂ©sistance Ă  l’effet des hormones et par consĂ©quent le dĂ©veloppement d’obĂ©sitĂ©.

  • SĂ©minaire DKM. SAT : rĂ©soudre un problème difficile pour les rĂ©soudre tous
    Laurent Simon (LABRI)
    Thursday, November 28, 2019 – 10:30
    Room Markov
    Talk abstract: 

    Les progrès autour de la rĂ©solution pratique du problème SAT, le problème NP-Complet canonique, ont Ă©tĂ© spectaculaires dans certains domaines applicatifs. MĂŞme si des limites fortes existent toujours sur quelques problèmes fortement combinatoires, nous prĂ©senterons, dans cet exposĂ©, quelques applications clĂ©s qui ont bĂ©nĂ©ficiĂ© de ces progrès.Nous prĂ©senterons Ă©galement comment la logique propositionnelle, au coeur de SAT, permet de modĂ©liser et de rĂ©soudre des problèmes de raisonnement bien au delĂ  de ce formalisme initial. Ainsi, l’exposĂ© se conclura par la prĂ©sentation des progrès rĂ©cents en compilation de connaissance, formalisme puissant, gĂ©nĂ©ral et Ă©lĂ©gant pour le raisonnement.

  • neXtProt : la plateforme de connaissance de l’Institut Suisse de Bioinformatique sur les protĂ©ines humaines Lydie Lane et l’équipe neXtProt
    Lydie Lane (SIB)
    Thursday, November 21, 2019 – 10:30
    Room Aurigny
    Talk abstract: 

    La base de connaissance neXtProt (www.nextprot.org) a Ă©tĂ© crĂ©Ă©e en 2011 pour faire face Ă  l’afflux de donnĂ©es « omiques Â» sur les protĂ©ines humaines (1). Elle reprend l’ensemble des sĂ©quences humaines d’UniProtKB/Swiss-Prot et des annotations associĂ©es, et y ajoute de nombreuses donnĂ©es de gĂ©nomique, transcriptomique et protĂ©omique sĂ©lectionnĂ©es sur des critères de qualitĂ© particulièrement stricts. La version actuelle de neXtProt comprend plus de 6 millions de variants gĂ©nĂ©tiques, près de 2 millions de peptides identifiĂ©s par spectromĂ©trie de masse, et de nombreuses donnĂ©es sur la localisation et la fonction des protĂ©ines humaines (2). Libre et gratuite d’accès (sous licence « Creative Commons Attribution Â»), neXtProt a Ă©tĂ© choisie en 2013 comme base de donnĂ©es de rĂ©fĂ©rence pour le « Human Proteome Project Â» du consortium HUPO (3)(4). Le modèle de donnĂ©es RDF de neXtProt, son interface de programmation applicative (API) et son point d’accès SPARQL permettent une bonne interopĂ©rabilitĂ© avec d’autres ressources. Pour faciliter l’écriture des requĂŞtes SPARQL par nos utilisateurs, nous offrons une liste de près de 200 requĂŞtes prĂŞtes Ă  ĂŞtre modifiĂ©es (5)(6). Toutefois, nous aimerions encore amĂ©liorer l’ergonomie de notre moteur de recherche et sommes ouverts Ă  des collaborations dans ce domaine.

    Lydie Lane et l’équipe neXtProt.
    Groupe CALIPHO, SIB Institut Suisse de Bioinformatique & Département de Microbiologie et médecine moléculaire, Université de Genève, Suisse

    1. Lane,L., Argoud-Puy,G., Britan,A., Cusin,I., Duek,P.D., Evalet,O., Gateau,A., Gaudet,P., Gleizes,A., Masselot,A., et al. (2012) NeXtProt: A knowledge platform for human proteins. Nucleic Acids Res., 40.
    2. Gaudet,P., Michel,P.-A., Zahn-Zabal,M., Britan,A., Cusin,I., Domagalski,M., Duek,P.D., Gateau,A., Gleizes,A., Hinard,V., et al. (2017) The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res., 45, D177–D182.
    3. Gaudet,P., Argoud-Puy,G., Cusin,I., Duek,P., Evalet,O., Gateau,A., Gleizes,A., Pereira,M., Zahn-Zabal,M., Zwahlen,C., et al. (2013) NeXtProt: Organizing protein knowledge in the context of human proteome projects. J. Proteome Res., 12.
    4. Omenn,G.S., Lane,L., Overall,C.M., Corrales,F.J., Schwenk,J.M., Paik,Y.-K., Van Eyk,J.E., Liu,S., Pennington,S., Snyder,M.P., et al. (2019) Progress on Identifying and Characterizing the Human Proteome: 2019 Metrics from the HUPO Human Proteome Project. J. Proteome Res., 10.1021/acs.jproteome.9b00434.
    5. Duek,P., Gateau,A., Bairoch,A. and Lane,L. (2018) Exploring the Uncharacterized Human Proteome Using neXtProt. J. Proteome Res., 17, 4211–4226.
    6. Zahn-Zabal,M. and Attwood,T.K. (2019) A Critical Guide to the neXtProt knowledgebase: querying using SPARQL. F1000Research, 8.

    • Les consĂ©quences gĂ©nomiques de l’évolution vers la reproduction sexuĂ©e facultative
      Sylvain Glémin (Ecobio)
      Thursday, November 14, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      La très grande majoritĂ© des espèces d’eucaryotes se reproduisent de façon sexuĂ©e impliquant deux partenaires. Cependant, de façon rĂ©currente des modes de reproduction exclusivement ou très majoritairement uniparentale Ă©voluent. Chez de nombreuses plantes par exemple, l’évolution de l’allofĂ©condation vers l’autofĂ©condation (individus hermaphrodites se reproduisant seuls) est très frĂ©quente. Une autre transition frĂ©quente est celle vers la parthĂ©nogenèse (femelles se reproduisant sans mâle via diffĂ©rentes modification de la mĂ©iose). Bien que pouvant ĂŞtre avantageuses Ă  court terme, ces stratĂ©gies reproductives sont considĂ©rĂ©es comme des culs de sacs Ă©volutifs et on ne connait pas de grand groupe d’organismes complètement asexuĂ©s. Les consĂ©quences gĂ©nĂ©tiques et gĂ©nomiques de ces modes de reproduction sont nombreuses -en particulier l’arrĂŞt ou la rĂ©duction de la recombinaison – et souvent nĂ©gatives. Elles expliqueraient que ces lignĂ©es autofĂ©condantes et asexuĂ©es soient vouĂ©es rapidement Ă  l’extinction. Après une prĂ©sentation du contexte thĂ©orique gĂ©nĂ©rale sur l’évolution des systèmes de reproduction je dĂ©velopperai deux exemples: 1) les consĂ©quences gĂ©nomiques de l’évolution vers l’autofĂ©condation chez les espèces sauvages apparentĂ©es au blĂ© (genres Aegilops et Triticum) et 2) les consĂ©quences gĂ©nomiques de l’évolution vers l’automixie (forme de parthĂ©nogenèse) chez les artĂ©mies (CrustacĂ©es).

    • Learning clinical networks from medical records based on information estimates in mixed-type data
      Hervé Isambert (Institut Curie)
      Thursday, October 17, 2019 – 10:30
      Room Aurigny
      Talk abstract: 

      Network reconstruction aims at disentangling direct from indirect dependences in information-rich data and has become ubiquitous to analyze the rapidly expanding resources of genomic and clinical data. However, direct and indirect interdependences in mixed-type (continuous / categorical) clinical data are notoriously difficult to assess. To this end, we developed and implemented an efficient computational approach to simultaneously compute and assess the significance of multivariate information between any combination of mixed-type variables. The method is then used to uncover direct, indirect and possibly causal relationships between mixed-type data from medical records, by extending a recent machine learning method to reconstruct graphical models beyond simple categorical datasets. The method is shown to outperform existing tools on benchmark mixed-type datasets, before being applied to analyze the medical records of eldery patients with cognitive disorders from La Pitié-Salpêtrière Hospital, Paris, and breast cancer patients from Institut Curie hospitals.

    • Estimation des communautĂ©s microbiennes impliquĂ©es dans un procĂ©dĂ© de mĂ©thanisation des biodĂ©chets
      Patrick Dabert (IRSTEA)
      Thursday, October 3, 2019 – 10:30
      Room Aurigny
      Talk abstract: 

      La valorisation des dĂ©chets organiques par le processus de mĂ©thanisation est en plein essor. Elle permet de capter les Ă©missions gazeuses naturelles des dĂ©chets pour en faire une source d’énergie renouvelable, le biogaz. Elle produit aussi un rĂ©sidu organique stabilisĂ© utilisable en agriculture, le digestat. La mĂ©thanisation est un processus biologique de dĂ©gradation de la matière organique en absence d’oxygène. Sur le plan microbiologique, il s’agit d’une chaine trophique qui met en Ĺ“uvre des centaines d’espèces microbiennes qui « travaillent Â» en synergie ou en compĂ©tition. MalgrĂ© des dĂ©veloppements technologiques importants et une connaissance accrue des voies mĂ©taboliques impliquĂ©es, notre maitrise des communautĂ©s microbiennes reste insuffisante pour piloter correctement les procĂ©dĂ©s. Après une prĂ©sentation rapide des enjeux et des connaissances actuelles sur les voies mĂ©taboliques de la mĂ©thanisation, l’exposĂ© prĂ©sentera les rĂ©sultats obtenus lors du suivi du fonctionnement d’un procĂ©dĂ© de mĂ©thanisation de biodĂ©chets pendant un an (Suivi des indicateurs de performance: biogaz, pH, acides gras volatiles, etc.; CaractĂ©risation des communautĂ©s microbiennes par sĂ©quençage haut dĂ©bit des ADNr 16S) et les travaux rĂ©alisĂ©s par ThĂ©o Combe (stage M1 en collaboration avec S. Blanquart et A. Siegel) pour tenter d’identifier les espèces nĂ©cessaires au fonctionnement du procĂ©dĂ©, d’infĂ©rer les voies mĂ©taboliques potentielles de ces espèces sur la base de leur sĂ©quence ADNr16S et d’analyser l’évolution de la communautĂ© microbienne en fonction des paramètres de fonctionnement du procĂ©dĂ©. Enfin, cet exemple d’étude sera utilisĂ© pour prĂ©senter les verrous et questionnements actuels des Ă©cologues microbiens sur l’interprĂ©tation des donnĂ©es de sĂ©quençage haut dĂ©bit des ADNr16S. Mots clĂ©s : digestion anaĂ©robie, microbiome, ARNr 16S, mĂ©tabolisme

    • Depicting microbial genomic diversity via a Partitioned Pangenome Graph
      Guillaume Gautreau (genoscope)
      Thursday, September 26, 2019 – 10:30
      Room Aurigny
      Talk abstract: 

      Thanks to the fascinating gush of newly sequenced genomes, genomics studies in microbiology now frequently rely on the comparison of hundreds to thousands of genomes of a single species. A consensus representation of multiple genomes would provide a better analytical framework than using individual reference genomes. This leads to a paradigm shift from the usual linear representation of reference genomes to a pangenome graph representation bringing together all the different known variations as multiple alternative paths. Classical pangenomic approaches (Medeni et al. 2005, Tettelin et al. 2005) use isolated sets of gene families partitioned in core (genes present in all the genomes of a species) or accessory genome (genes present in at least one genome of a species). Inspired by the methods released in the last few years, we propose to update the Tettelin’s insights by organizing gene families in a pangenome graph to depict the microbial diversity. Some approaches have been developed to factorize pangenomes at the sequence level only (reviewed in Marschall et al. 2016). However, these approaches lack of direct information about genes, complicating the functional analyses from the study of the graph. The method introduced here, named PPanGGOLiN, can be considered as the missing link between the usual pangenomics approach (set of isolated gene families) and the pangenome graph at the sequence level.In current pangenomics approaches, core genes are most often defined as the set of ubiquitous genes in a clade. However, this definition has 2 major flaws: it is not robust against poorly sampled data because it is highly reliant on the presence/absence of genes in a single genome; it misses many core genes  because of the high probability to lose at least one of the core genes due to sequencing, assembly or annotation artifacts. In consequence, the core genome obtained from a large set of genomes can be very low requiring a relaxed definition of a core genome (generally using a fixed threshold of presence equals to 95% of the genomes). Unlike the few statistical approaches available to estimate a relaxed core genome without fixing an arbitrary threshold, PPanGGOLiN does not relies on the frequencies of gene family presence but uses the patterns of presence/absence and the pangenome graph to make the partitioning. This original approach is able to discriminate 2 sets of genes having the same frequencies of presence albeit coming from 2 different subsets of genomes. Moreover, the usual dichotomy between core and accessory genomes does not faithfully reports the diverse ranges of gene frequencies in a pangenome. Thereby, as proposed by Koonin et al. 2008 and formally modeled by Collins et al. 2012, the pangenome can be split into 3 groups. This choice helps to shed light on genes potentially associated with positive environmental adaptations while avoiding to confound them with potentially randomly acquired ones. For that purpose, based on the patterns of presence/absence and the pangenome graph, PPanGGOLiN divides the pangenome into (1) persistent genome, equivalent to a relaxed core genome (genes conserved in almost all genomes); (2) shell genome, moderately conserved genes potentially associated with environmental adaptation capabilities; (3) cloud genome, rare genes.Based on this partitioned pangenome representation, we can annotate nodes in the graph to highlight alternative paths and associate relevant metadata to them. Someway, drawing genomes on rails like a subway map may help biologists to browse the pangenome and compare their genomes of interest to the overall pangenomic diversity.

    • bistro: a library to build large-scale workflows in computational biology
      Philippe Veber (LBBE)
      Thursday, June 13, 2019 – 10:30 to 11:00
      Room Aurigny
      Talk abstract: 

      Computational pipelines for analyzing high-throughput genomics datasets typically consist of tens to hundreds of shell commands, generating thousands of files and running for days or weeks. While becoming rather complex pieces of software, they are most of the time still programmed using rudimentary tools like shell scripts, which offer very little help to develop large and reusable programs. In addition to being error-prone, implementing computational pipelines using shell scripts leaves lots of tedious aspects to the programmer, diverting her/his attention from data analysis considerations. In this work, I propose to leverage a modern, statically typed programming language to implement as a simple library a comfortable environment to develop bioinformatics pipelines. This library is named bistro and is written in the OCaml language. Among other features, it provides dependency tracking, parallel execution, resume-on-failure, automatic naming of intermediate files, easy deployment of pipelines using Docker or Singularity for enhanced reproducibility. Thanks to the compiler type checker, errors on file formats or typos in command arguments are detected at compile-time, that is even before running the pipeline. I’ll show various benefits of embedding a pipeline development framework in a generalist language. Among other things, it becomes very easy to integrate a pipeline into a web server, or write extensible libraries of highly configurable pipelines.

    • From QC to isoform characterization : Evaluation and improvements of Nanopore sequencing in a RNASeq context
      Sophie Lemoine (IBENS)
      Thursday, June 6, 2019 – 10:30 to 11:00
      Room Aurigny
      Talk abstract: 

      Transcript identification is a real challenge with short read sequencing. With Oxford Nanopore Technologies (ONT), our aim is to sequence full-length cDNA to directly access isoforms. We have successfully validated analysis of differential expressed genes on a mouse model of myelination blockage following the standard ONT protocol. The mean length of our reads was 1.2kb, which is lower than the estimated 2kb mean length of the transcripts and even worse if we consider the TSL1 tagged transcripts (2.6kb). To improve our results, we combined SmartSeq and ONT technologies to synthesize full-length cDNA from total RNA. The cDNA were barcoded in order to sequence multiplesamples on a single MinION run and allow differential expression analyses. The SmartSeq/ONT protocol allowed us to sequence much longer cDNAs. The mean length of thereads was then about 2.6kb and the small reads that were the majority of the population with ONT protocol were eradicated. We were able to detect more differentially expressed targets. The targets detected were longer than the ONT protocol ones. The optimized protocol globally achieved a better 5’-3’ transcripts coverage and not surprisingly, for those longer than 2kb. If it does not ensure you have full-length cDNAs, it can be reliable for cDNA sequencing and improve isoform annotation andquantification using dedicated pipelines, such as FLAIR or Pinfish.The goal of my talk is to give an idea of :- the evolution of the protocols tested and improved;- the developments we had to perform to make the QC of our runs;- the ongoing evaluation of FLAIR and Pinfish in our context.

    • Approches gĂ©nomiques d’étude de l’évolution des systèmes de dĂ©termination du sexe chez les poissons
      Yann Guiguen (INRA IPGP)
      Thursday, May 16, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Les poissons prĂ©sentent une grande variĂ©tĂ© de leurs mĂ©canismes de dĂ©termination du sexe allant de systèmes purement gĂ©nĂ©tiques Ă  des systèmes dĂ©terminĂ©s complĂ©tement ou en partie par l’environnement (tempĂ©rature, densitĂ© …). Curieusement, cette variabilitĂ© ne suit aucun schĂ©ma phylogĂ©nĂ©tique Ă©vident, avec des transitions rapides au sein d’espèces Ă©troitement apparentĂ©es, voire mĂŞme au sein de populations diffĂ©rentes de la mĂŞme espèce. Pour mieux comprendre cette diversitĂ© et les mĂ©canismes qui rĂ©gissent l’évolution des chromosomes sexuels, nous avons appliquĂ© des approches de sĂ©quençages gĂ©nomiques partiels (Rad-Sequencing) ou complets (Pool-Sequencing) sur un grand nombre d’espèces de poissons pour pouvoir caractĂ©riser les systèmes de dĂ©termination du sexe, dĂ©limiter les rĂ©gions chromosomiques des loci sexuels et identifier des gènes candidats comme dĂ©terminants majeurs du sexe. Ces stratĂ©gies ont conduit Ă  l’identification du type de dĂ©terminisme sexuel chez de nombreuses espèces avec des systèmes monofactoriels simples (XX/XY ou ZZ/ZW), mais Ă©galement des espèces avec des systèmes de dĂ©termination du sexe plus complexes. Ces rĂ©sultats nous ont aussi permis d’identifier de nouveaux gènes dĂ©terminants majeurs du sexe et de montrer que ceux-ci sont souvent « recrutĂ©s » dans un nombre relativement faible de voies de signalisation.

    • From alignment-free heuristics to an interactive visualization: V(D)J repertoire analysis in the Vidjil platform
      Mikaël Salson (CRIStAL U. Lille)
      Thursday, April 25, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      The diversity of the immune repertoire is grounded on V(D)J recombinations. Many algorithms and software identify these recombinations inside high-throughput sequencing data. We introduce new Aho-Corasick based heuristics to speed up the detection of V(D)J sequences in high-throughput sequencing data. We also show how those heuristics can speed up the identification of V(D)J recombinations. Our experiments show that those new heuristics improve time and space consumption of our previous algorithm — Vidjil-algo — while keeping its sensitivity and specificity. Such improvements are of importance when dozens of samples are to be analysed as is commonly the case in a clinical setting. In such a case users launch their analyses and interpret their results through a web application we have designed for this purpose.

    • Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling
      Gautier Richard (IGEPP INRA)
      Thursday, April 18, 2019 – 10:30
      Room Aurigny
      Talk abstract: 

      Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.

    • Influence of urbanization on the human gut and oral microbiome
      Laure SĂ©gurel (mnhn)
      Thursday, April 11, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Industrialization has been associated with a loss of human gut microbiota diversity. As a decreased gut microbiome diversity is also correlated with a number of modern diseases, understanding what factors drive this loss is vital for public health. It is also of evolutionary interest to understand how gut bacteria are adapting to rapidly changing environments. However, industrialized and non-industrialized populations differ in many ways, making it practically impossible to disentangle the effects of diet, sanitary conditions, medical practices or other factors. Moreover, gut protozoa, who have likely shaped the human-gut microbiota interactions throughout their coevolutionary history but are virtually absent from industrialized populations, are rarely taken into account. Finally, even less is known about the effects of industrialization on other microbiomes, including the oral microbiome, another important health-associated microbial community. To address some of these limitations, we examined oral and gut microbiomes of 140 individuals from Cameroon along a small-scale urbanization gradient. Apart from metagenetic and metagenomic data, we collected a number of ethnological, medical, sanitary and parasitological parameters in order to identify factors that influence microbiome diversity and variation. 

       

    • Alignements et distances d’Ă©dition avec fragmentations, de la bioinformatique Ă  l’informatique musicale
      Mathieu Giraud (U. Lille, CRIStAL)
      Thursday, March 28, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Les comparaisons de sĂ©quences jouent un grand rĂ´le en bioinformatique mais aussi en informatique musicale : pourquoi et comment mesurer la similaritĂ© de deux sĂ©quences d’ADN ou de deux mĂ©lodies ?Les opĂ©rations habituelles de substitutions, d’insertions et de dĂ©lĂ©tions peuvent ĂŞtre Ă©tendues pour mieux modĂ©liser ces similaritĂ©s. L’algorithme de Mongeau-Sankoff (1990) a ainsi introduit les opĂ©rations de fragmentations (et de consolidations), faisant correspondre Ă  une note un ensemble de notes – comme on peut faire correspondre Ă  un nuclĂ©otide un homopolymère rĂ©sultant d’une erreur de sĂ©quençage. Je prĂ©senterai quelques rĂ©sultats sur l’Ă©tude de variations utilisant les fragmentations ainsi que sur la correspondance entre aligments et distance d’Ă©dition dans ce cas, et sur les dĂ©fis du calcul de telles distances. Ces travaux ont Ă©tĂ© effectuĂ©s lors de collaborations avec Henry Boisgibault et Florent Jacquemard ainsi qu’avec Emilios Cambouropoulos et Ken DĂ©guernel.

       
    • Livestock genome annotation: transcriptome and chromatin structure profiling in cattle, goat, chicken and pig.
      Sarah Djebali-Quelen (INRA GenPhySE)
      Thursday, March 14, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Functional annotation of livestock genomes is a critical step to decipher the genotype-to-phenotype relationship underlying complex traits. As part of the Functional Annotation of Animal Genomes (FAANG) action, the FR-AgENCODE project aims at profiling the landscape of transcription (RNA-seq) and chromatin accessibility and conformation (ATAC-seq and Hi-C) in four livestock species representing ruminants (cattle, goat), monogastrics (pig) and birds (chicken), using three target samples related to metabolism (liver) and immunity (CD4+ and CD8+ T cells). Standardized protocols were applied to produce transcriptome and chromatin datasets for the four species. RNA-seq assays allowed to considerably extend the available catalog of protein-coding and non-coding transcripts. Gene expression profiles were consistent with known metabolic/immune functions and revealed differentially expressed transcripts with unknown function, including new lncRNAs in syntenic regions. The majority of ATAC-seq peaks of chromatin accessibility mapped to putative regulatory regions, with an enrichment of predicted transcription factor binding sites in differentially accessible peaks. Hi-C provided the first set of genome-wide maps of three-dimensional interactions across livestock and showed consistency with results from gene expression and chromatin accessibility in topological compartments of the genomes. We report the first multi-species and multi-assay genome annotation results obtained by a FAANG pilot project. The global consistency between gene expression and chromatin structure data in these four livestock species adds up to previous findings in model animals. Overall, these results emphasize the value of FAANG for the research on domesticated animals and strengthen the importance of future meta-analyses of the reference datasets being generated by this community on different species.

       

    • CG-alcode : explorer l’expression alternative du gène
      Jean-Stéphane Varré (CRIStAL U. Lille)
      Thursday, March 7, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Dans cet exposé nous présenterons la méthode CG-alcode qui permet de comparer deux ensembles de transcrits pour une paire de gènes orthologues chez deux espèces en construisant un modèle pour chaque gène, puis grâce au modèle construit, d’identifier les « orthologues d’épissage » et d’inférer des transcrits putatifs. Nous insisterons sur l’algorithme d’identification des signaux fonctionnels connus et prédits pour la construction du modèle. Puis nous présenterons deux pistes pouvant utiliser les résultats de modélisation de CG-alcode : l’exploration de l’ensemble des transcrits potentiels par identification de « régulateurs » et l’identification de transcrits alternatifs à partir de données de séquençage de troisième génération.

       

    • Analyses of thousands of molecular events, example of RNA metabolism in Fronto-Temporal Dementias
      Vincent Anquetil (ICM)
      Thursday, February 28, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      FrontoTemporal dementias (FTD) are characterized by progressive behavioral and language changes, associated with an atrophy of the frontal and temporal lobes. Amyotrophic lateral sclerosis (ALS) is a rapidly progressive and fatal motor neuron disease. If ALS is a poorly heritable disorder (about 10%), up to 50% of the FTD correspond to forms with genetic transmission. Mutations in 3 genes are responsible for most of the FTD genetic cases: microtubule associated protein tau (MAPT), progranulin (PGRN) and chromosome 9 open reading frame 72 (C9orf72). Genetic or sporadic FTDs share common neuropathological features such as neuronal Tubulin-Associated Unit (TAU), Tar-DNA binding Protein 43 (TDP43), or Fused in Sarcoma (FUS) inclusions. TDP43 and FUS neuronal inclusions are common to FTD and ALS. Up to 50% of ALS patients develop FTD symptoms and around 15% of FTD patients display motor neuron dysfunction typical of ALS. To date, no treatment is available for these disorders, and the molecular mechanisms at stake in the different pathological subtypes remain elusive.We analyzed, at the molecular level, the affected (frontal) and preserved (occipital) cortices of FTD +/- ALS patients. High-throughput RNA sequencing was performed to analyze transcriptome, splicing profiles and micro-RNAs misregulation for a subset. The samples were sorted according to their genetic mutation (C9orf72, MAPT, PGRN), neuropathology (TAU+, TDP+, FUS+), phenotype (FTD, FTD+ALS, ALS) and compared to a set of controls. Gene expression data allowed to differentiate the three phenotypes: pure FTD, pure ALS and FTD/ALS. Hundreds of differential RNA maturation profiles (splicing) were observed between mutations. Globally, less than 10% of the computed changes in RNA processing lead to modification of RNA expression. Therefore, the differently processed mRNAs can lead to the synthesis of 1) a different ratio of existing proteins 2) mis-localization of the newly synthesized protein 3) the synthesis of aberrant proteins in FTD patients. So, these diseases known as proteinopathies can be due to an accumulation of RNA processing defects, making FTD +/- ALS also general RNAopathies.

       

    • Vers une mĂ©decine personnalisĂ©e des amyloses.
      Christian Delamarche IGDR
      Thursday, January 31, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Les amyloses forment un vaste groupe de pathologies dĂ©gĂ©nĂ©ratives caractĂ©risĂ©es par l’auto-agrĂ©gation de certaines protĂ©ines, ou de fragments de celles-ci, sous forme d’entrelacements fibrillaires. Chez l’homme, on connait une trentaine de protĂ©ines distinctes qui sont caractĂ©ristiques de maladies aussi diverses que la maladie d’Alzheimer, de Parkinson, de Charcot, la chorĂ©e de Huntington, les maladies Ă  prions, mais aussi des maladies plus ou moins sĂ©vères qui touchent la quasi-totalitĂ© des tissus et organes (cĹ“ur, reins, nerfs, poumons, foie, peau, etc.). Depuis une quinzaine d’annĂ©es nous menons une approche globale dont la finalitĂ© est en relation avec le diagnostic prĂ©coce, le pronostic et la thĂ©rapie des amyloses Ă  l’aide de techniques Ă  la croisĂ©e de la biochimie, de l’informatique et de l’Ă©lectronique. L’exposĂ© mettra l’accent sur l’utilisation de la dĂ©couverte de motifs et de la modĂ©lisation molĂ©culaire appliquĂ©es Ă  l’Ă©tude de mutations ponctuelles responsables d’amyloses. Le projet de crĂ©ation d’un rĂ©seau de compĂ©tences pour l’Ă©tude des amyloses pourra ĂŞtre discutĂ©.

       

    • Facilitating long non-coding RNAs (lncRNAs) annotation using FEELnc and its application to the dog transcriptome
      Thomas Derrien (IGDR)
      Thursday, January 24, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks consists in correctly identifying the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, I will present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model. FEElnc, freely available (https://github.com/tderrien/FEELnc), moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs. Finally, I will develop the use of FEELnc to characterize lncRNAs in the domestic dogs (Canis lupus familiaris) and to pinpoint lncRNAs involved in diseases. 

       

    • Handling dependence or not in SNP-set testing approaches of Genome-Wide Association Studies
      David Causeur (Agrocampus Ouest)
      Thursday, January 17, 2019 – 10:30 to 12:00
      Room Aurigny
      Talk abstract: 

      The proper way to handle dependence across features in high-throughput genomic data has rai-sed fundamental discussions with unclear general conclusions or nal recommendations. Oneof the most obvious illustration of this point is the tremendous eort of the statistics researchcommunity to address the impact of dependence on the False Discovery Rate (FDR)-controllingmethod by Benjamini and Hochberg (1995), which was initially designed under an independenceassumption. Another famous questioning example is provided by the strikingly good perfor-mance of a naïve Bayes procedure ignoring dependence in a comparative study of machinelearning methods by Dudoit et al. (2002) to predict classes from gene expression data.Addressing the dependence issue has often consisted in assessing its detrimental impact on theperformance of standard methods designed to be optimal under independence, and deduce ad-hoc improvements. To be valid for arbitrarily complex dependence patterns, such approaches inwhich dependence is viewed as a curse can lead to poorly powerful procedures. Therefore, bothfor machine learning and testing issues, a new generation of methods have emerged, advocatingfor an ad-hoc handling of dependence consisting in a preliminary whitening of the data (seeAhdesmäki and Strimmer, 2010, Hall and Jin, 2010). However, disentangling the dependentnoise and the true association signal is very challenging and decorrelation can then lead to analteration of the true association signal.For the purpose of global testing, where the objective is to test for the signicance of anassociation signal between a set of features and a covariate, Arias-Castro el al. (2011) suggeststhat the optimal handling of dependence shall be specic of the pattern of the true associationsignal, especially through its sparsity rate. The former global testing framework covers a widescope of applications, such as functional Analysis of Variance (fANOVA) and association testsbetween a region of the genome formed by contiguous Single Nucleotide Polymorphisms (SNP)and a case/control response variable in Genome Wide Association Studies. Interestingly, in thetwo former elds of applications, many popular methods are just based on simple aggregationof pointwise test statistics ignoring their dependence.In SNPset approaches of GWAS, both the dependence pattern and the association signal canbe very dierent between regions of the genome. After a general discussion on the performanceof testing methods ignoring dependence or whitening the pointwise test statistics, the presen-tation will show that those two extreme choices cannot be uniformly powerful over the varietyof dependence and association patterns. We therefore introduce a new class of aggregationmethods spanning the range between ignorance of dependence and complete decorrelation andpropose a method minimizing a distance between the null and non-null moment generatingfunctions of the test statistics within the former class to choose the more appropriate handlingof dependence. We also discuss the applications of the former general principles to predictionin high-dimension.Keywords: Dependence, Genome-Wide Association Studies, Global Testing, Functional Ana-lysis of Variance, High dimension, Statistical learning.

       

 

see year: 2020 – 2019 – 2018201720162015201420132012