Previous seminars – 2012

(archives from old symbiose site)

GenoLego: A fast and sensitive method to annotate genes for applications in phylogenetics

Olivier Mirabeau (INRA -Versailles)

Thursday, December 20, 2012 – 10:30

Room Aurigny

Talk abstract:

Over the last 15 years a large number of genomes have become available, efficient gene tree reconstruction tools have been developed and computational power has increased dramatically; it has now become possible to address the question of ancient evolution of large multi-gene families, including metazoan G protein-coupled receptors.

Typically the first step in phylogenetics studies is to annotate genes that code for known protein families, and this can be difficult in the case of divergent genes or diversified protein families. I have developed a hidden Markov model-based method, called GenoLego, designed to efficiently annotate, in genomes, unknown members of large diversified gene families. The program uses a built-in gene model and takes as input genomic sequences and a protein alignment. It sequentially selects portions of the alignments to be modeled, builds a protein profile HMM, constructs a joint gene-protein model, and labels the DNA sequences using a sensitive dynamic programming algorithm to locally predict exons that code for portions of divergent protein members of that multi-gene family. I will give an overview of the model and the algorithms, and present some results on a preliminary benchmark of GenoLego against Genewise, the annotation suite used in Ensembl. I will then briefly illustrate how this program can be used with two case studies, the evolution of bilaterian rhodopsin beta GPCRs and insect olfactory receptors.
Arc orientation in Network Inference

Patrick Meyer – Machine Learning Group ( Universite Libre de Bruxelles)

Thursday, December 13, 2012 – 10:30

Room Aurigny

Talk abstract:

Dans le travail présenté, nous proposons un nouvel algorithme pour l’orientation des arcs d’un réseau inféré sur des données d’expression. L’information sur l’orientation des arcs ne provient ni de séries temporelles, ni d’une connaissance apriori sur les facteurs de transcription.L’intérêt de l’approche est donc de permettre de retrouver les facteurs de transcription d’une cellule rien qu’à partir de données d’expressions.Notre heuristique peut gèrer des réseaux de plusieurs milliers de variables.
Vers une exploration mathématique et algorithmique des interactions intra- et inter-organismes.
Marie France Sagot (INRIA, Lyon)

Thursday, November 29, 2012 – 10:30

Room Aurigny
Talk abstract:
Dans cette présentation je dresserai un panorama des activités conduites dans l’équipe Inria Bamboo à Lyon.

“De la séquence du génome du pommier au décryptage de son fonctionnement” & “Biologie des communautés microbiennes associées aux semences”

Jean Pierre Renou & Marie Agnès Jacques (INRA / Agrocampus-ouest / Université d’Angers)

Thursday, November 22, 2012 – 10:30

Room Aurigny

Talk abstract:

De la séquence du génome du pommier au décryptage de son fonctionnementLe génome du pommier a été récemment séquencé (Velasco et al. 2010 Nat. Genet.). Cette première séquence, bien qu’imparfaite, étend considérablement le champ des recherches possibles dans cette espèce, en particulier pour identifier les gènes impliqués dans divers caractères d’intérêt et élucider leurs fonctions. Nous avons cependant décidé d’affiner l’annotation fonctionnelle du génome par séquençage haut-débit de transcrits (RNAseq), qui nous a permis de préciser le positionnement et la longueur des unités de transcription, d’identifier de nouveaux transcrits, des occurrences de splicing alternatif et même de trans-splicing. Nous avons ensuite conçu un microarray (technologie Nimblegen) couvrant tous les transcrits annotés du pommier pour procéder à des etudes de transctiptome sur de grandes séries d’échantillons. Une de ces premières études fut l’élaboration un atlas d’expression des gènes dans tous les organes à titre de référence. Les données sont rendues accessibles via le génome browser du “Genome Database for Rosaceae”. Les sondes ont été sélectionnées sur la base de la meilleure spécificité d’expression, avec un choix de couvrir chaque locus avec une sonde sens et son équivalent anti sens. De façon surprenante environ 65% des gènes exprimés présentent une transcription anti-sens, avec une grande diversité en fonction des familles de gènes concernées et des organes étudiés. Un séquençage de petits ARN a enfin révélé une parfaite corrélation entre la présence simultanée de transcrits sur les deux brins et le nombre de petits ARN produits sur un locus donné, indiquant une probable régulation négative post-transcriptionnelle par des “natural antisense Si-RNAs” d’une majorité des gènes. Ce résultat, validé par QRT-PCR directionnelle sur quelques gènes, pose la question de l’exacte signification des mesures d’expression des gènes, habituellement réalisées par QRT-PCR non directionnelles, qui rapportent en fait le niveau d’expression totale des deux transcrits en sens opposé. Biologie des communautés microbiennes associées aux semencesLa semence est le vecteur d’une microflore diversifiée comprenant entre autres des microorganismes pathogènes des plantes mais également de l’homme ainsi que des bactéries bénéfiques au développement des plantes. Les sources d’inoculum de la semence et le rôle de cette semence dans l’établissement des communautés microbiennes associées aux plantules sont méconnues. La graine, de par ses caractéristiques particulières (organe déshydraté), représente un environnement sélectif pour la microflore. L’objectif principal du projet que nous développons est d’établir des liens fonctionnels entre la microflore de la graine et celles de la phyllosphère et de la plantule pour éclairer sur le rôle de la phyllosphère en tant que source de contamination de la graine, le rôle de la graine en tant que source d’inoculum pour la plantule et d’identifier les fonctions propres aux bactéries associées aux semences qui leur permettent de survivre dans cet environnement extrême caractérisé, entre autres, par son faible niveau d’hydratation.

Formal model reduction

Jerome Feret (ENS ULM, Paris)

Thursday, November 15, 2012 – 10:30

Room Aurigny

Talk abstract:

Combinatorial explosion of protein states generated by post-translational modifications and complex formation. Rule-based models provide a powerful alternative to approaches that require an explicit enumeration of all possible molecular species of a system. Such models consist of formal rules stipulating the (partial) contexts for specific protein-protein interactions to occur. These contexts specify molecular patterns that are usually less detailed than molecular species. Yet, the execution of rule-based dynamics requires stochastic simulation, which can be very costly. It thus appears desirable to convert a rule-based model into a reduced system of differential equations by exploiting the lower resolution at which rules specify interactions.In this talk, we present a formal framework for constructing coarse-grained systems. We track the flow of information between different regions of chemical species, so as to detect and abstract away some useless correlations between the state of sites of molecular species.The result of our abstraction is a set of molecular patterns, called fragments, and a system which describes exactly the concentration (or population) evolution of these fragments. The method never requires the execution of the concrete rule-based model and the soundness of the approach is described and proved by abstract interpretation.

EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer)

Thierry Lecroq (LITIS, Rouen)

Thursday, October 25, 2012 – 10:30

Room Aurigny

Talk abstract:

In this talk I will describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical Whole Exome Sequencing. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data coming from Next Generation Sequencing, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. I will report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease.

Functional Analysis and Comparison of Gene Sets: Benefits of Using Semantic Similarity for Clustering DAVID Results

Olivier Dameron (INSERM, Rennes 1) & Frédéric Hérault (INRA, UMR PEGASE)

Thursday, October 11, 2012 – 10:30

Room Aurigny

Talk abstract:

Functional analysis of a set of genes consists in identifying its underlying biological features and is a challenging task. DAVID generates a list of enriched Gene Ontology (GO) terms and groups similar annotations into clusters ranked according to their enrichment score. However, its limitations are two-folds: it produces a lot of clusters of GO terms and it ignores the underlying semantics of Gene Ontology between these terms. We hypothesize that leveraging the semantics of Gene Ontology addresses both the quantity and the redundancy problems of DAVID and improves the functional analysis and comparisonof sets of genes. We propose to compute the semantic similarity of the clusters of GO terms returned by DAVID and to use it to group similar clusters. We applied this approach on two sets of genes from a porcine muscular transcriptome study. To analyze a set of genes, it reduced the number of clusters respectively from twelve to four and from seventeen to five “super clusters”. These “super clusters” correspond respectively to four and four biologically-relevant processes. To compare the sets of genes, our approach successfully identified three similar functions shared by the two sets, as well as one significant function related to nucleic acid metabolic process that was specific to the second set. These results show that post-processing DAVID results using semantic similarity-based hierarchical clustering is relevant for the functional analysis and comparison of large sets of genes. Keywords: Functional gene analysis, Gene Ontology, semantic similarity, hierarchical clustering.

A combinatorial and integrated method to analyse RNA-seq reads

Nicolas Philippe (Équipe MAB, Lirmm, Montpellier)

Thursday, September 20, 2012 – 10:30

Room Aurigny

Talk abstract:

RNA sequencing enables a complete investigation covering the full dynamic spectrum of a transcriptome. It thus paves the way to a better understanding of the function of gene expression in different tissues, during development or pathological states. However, the splicing process, which generates both co-linear and non co-linear RNAs, the inclusion of sequencing errors, somatic mutations, polymorphisms, and rearrangements make the reads differ from the reference genome in a variety of ways. This complicates the task of comparing reads with a genome. Currently, the analysis paradigm consists in: 1. mapping the reads to a reference genome contiguously allowing as many differences as one expects to be necessary to accommodate sequence errors and small polymorphisms;2. using uniquely mapped reads to determine covered genomic regions, either for computing a local coverage to predict mutations and filter out sequence errors (cf. program ERANGE), or for delimiting expressed exons approximately (cf. program TopHat);3. re-aligning unmapped reads, which were not mapped contiguously at step one, to reveal splicing junctions. Limitations of this approach include lack of precision, redundant computations due to multi-mapping steps, error propagation due to heuristics and the absence of back-tracking. We propose a novel, integrated approach to analyze nowadays longer reads (> 50 bp). The idea is to adopt a k-mer approach that combines the genomic positions and local coverage to perform a complex analysis of each read and detect in a single step, mutations, indels, errors, as well as both normal and chimeric splice junctions. Comparisons with other tools demonstrate the feasibility of this approach, which yields both sensitive and highly specific inferences.

High-Throughput Transcriptomics

Micha Sammeth, (Centre Nacional d’Anàlisi Genòmica (CNAG), Barcelona)

Thursday, June 14, 2012 – 11:30

Room Aurigny

Talk abstract:

In the seminar I will provide an introduction to the fascinating technique called RNA-Seq; after a brief historical overview about complementary techniques used earlier, we will review elementary preprocessing steps of these experiments. Then I will outline several different applications of RNA-Seq, and I will summarize some possible ways to analyze the data avoiding known pitfalls.

Vers un modèle de fonctionnement d’une communauté bactérienne de sédiments marins pollués par de l’arsenic

Frédéric Plewniak (G.M.G.M – UdS/CNRS UMR7156 Strasbourg)

Thursday, May 31, 2012 – 10:30

Room Aurigny

Talk abstract:

L’arsenic, à l’origine d’importantes pollutions de l’eau dans des zones industrielles et post-industrielles du monde entier, présente des risques sanitaires graves pour les populations. Les techniques de génomique environnementale permettent aujourd’hui d’étudier les stratégies adaptatives et coopératives des communautés microbiennes des milieux pollués.Nous avons séquencé des métagénome issus des sédiments portuaires de l’Estaque, proche d’un ancien site métallurgique hautement pollué par l’arsenic près de Marseille, et de St Mandrier, près de Toulon. L’analyse à l’aide du protocole RAMMCAP des séquences obtenues a permis d’établir les profils fonctionnels et taxonomiques des deux métagénomes et de quatre métagénomes témoins disponibles dans les banques de données publiques.La biodiversité est plus importante dans les deux communautés sédimentaires par rapport à celles, dominées à plus de 80% par deux ordres, des sites témoins. L’ordre des Desulfobacterales représente 54.7% à l’Estaque et 31.7% à St Mandrier, tous les autres ordres présents étant répartis de manière relativement équitable. Toutefois la diversité microbienne est un peu plus importante à St Mandrier que sur le site hautement pollué de l’Estaque.Les ensembles Gene Ontology (GO) décrivant les profils fonctionnels ont été comparés afin de mettre en évidence les catégories sur-représentées dans les deux métagénomes d’intérêt par rapport aux quatre témoins. On observe ainsi une sur-représentation des catégories liées à la résistance à l’arsenic et aux réponses au stress oxydatif à l’Estaque. De plus, les données de métagénomique et les mesures physico-chimiques des paramètres environnementaux ont permis de proposer un modèle descriptif de fonctionnement des communautés procaryotiques mettant en évidence l’importance du cycle du soufre dans la détoxication de l’arsenic en relation avec la présence de bactéries réductrices du sulfate.

Fast and Accurate RNA-Seq read alignments with PALMapper

Géraldine Jean (LINA, Université de Nantes)

Thursday, May 10, 2012 – 10:30

Room Minquiers

Talk abstract:

High throughput sequencing of mRNA enhances transcriptome analysis and offers great opportunities for the discovery of new genes and the identification of alternative transcripts. However, the sheer amount of high throughput sequencing data requires efficient methods for accurate spliced alignments of reads against the reference genome, which is further challenged by the limited length and quality of the sequence reads.In this talk, I will present an original RNA-Seq read mapper, called PALMapper, that combines a faster extension of the high accurate alignment method QPALMA with the fast short read aligner GenomeMapper. PALMapper quickly carries out an initial read mapping which then guides a Banded Semi-Global alignment algorithm that allows for long gaps corresponding to introns. It computes both spliced and unspliced alignments at high accuracy by taking advantage of base quality information and computational splice site predictions brought together in an extended alignment scoring model.

LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure

Nicolas Bonnel (Université Bretagne Sud)

Thursday, May 3, 2012 – 10:30

Room Aurigny

Talk abstract:

In the last two decades, a lot of protein 3D shapes have been discovered, characterized and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. We present in this paper an approach entitled LNA (Laplacian Norm Alignment) that performs structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates at various scales local deformations of the topology where each residue is located. On some benchmarks widely shared by the community we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 seconds with a single recent GPU, which makes our algorithm very scalable and suitable for real-time database querying across the Web.

Métagénomique humaine : impacts cliniques

Nicolas Pons (INRA Jouy en Josas)

Thursday, April 26, 2012 – 10:30

Room Aurigny

Talk abstract:

La métagénomique humaine consiste à caractériser les associations entre les espèces et gènes microbiens et les phénotypes humains afin de développer des outils diagnostiques et pronostiques et des approches de modulation des populations microbiennes dans le but d’optimiser la santé de chacun. Les études de métagénomiques ont été facilitées ces dernières années avec le développement des technologies de séquençage et de criblage à très-haut débit. Dans ce séminaire, il sera présenté les quatre grands volets de la métagénomique : métagénomique fonctionnelle, métagénomique phylogénétique, métagénomique dite “whole sequencing” et métagénomique quantitative. Il sera porté une plus grande attention sur les deux derniers volets avec une illustration détaillée des derniers résultats obtenus dans les projets MicroObese et MetaHIT visant notamment à identifier les associations entre populations microbiennes et obésité.

see year: 2020 – 2019 – 2018 – 2017 – 2016 – 2015 – 2014 – 2013 – 2012

GenoLego: A fast and sensitive method to annotate genes for applications in phylogenetics

Arc orientation in Network Inference

Vers une exploration mathématique et algorithmique des interactions intra- et inter-organismes.

“De la séquence du génome du pommier au décryptage de son fonctionnement” & “Biologie des communautés microbiennes associées aux semences”

Formal model reduction

EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer)

Functional Analysis and Comparison of Gene Sets: Benefits of Using Semantic Similarity for Clustering DAVID Results

A combinatorial and integrated method to analyse RNA-seq reads

High-Throughput Transcriptomics

Vers un modèle de fonctionnement d’une communauté bactérienne de sédiments marins pollués par de l’arsenic

Fast and Accurate RNA-Seq read alignments with PALMapper

LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure

Métagénomique humaine : impacts cliniques