(archives from old symbiose site)
-
GenoLego: A fast and sensitive method to annotate genes for applications in phylogenetics
Olivier Mirabeau (INRA -Versailles)Thursday, December 20, 2012 – 10:30Room AurignyTalk abstract:Over the last 15 years a large number of genomes have become available, efficient gene tree reconstruction tools have been developed and computational power has increased dramatically; it has now become possible to address the question of ancient evolution of large multi-gene families, including metazoan G protein-coupled receptors.
Typically the first step in phylogenetics studies is to annotate genes that code for known protein families, and this can be difficult in the case of divergent genes or diversified protein families. I have developed a hidden Markov model-based method, called GenoLego, designed to efficiently annotate, in genomes, unknown members of large diversified gene families. The program uses a built-in gene model and takes as input genomic sequences and a protein alignment. It sequentially selects portions of the alignments to be modeled, builds a protein profile HMM, constructs a joint gene-protein model, and labels the DNA sequences using a sensitive dynamic programming algorithm to locally predict exons that code for portions of divergent protein members of that multi-gene family. I will give an overview of the model and the algorithms, and present some results on a preliminary benchmark of GenoLego against Genewise, the annotation suite used in Ensembl. I will then briefly illustrate how this program can be used with two case studies, the evolution of bilaterian rhodopsin beta GPCRs and insect olfactory receptors.
-
Arc orientation in Network Inference
Patrick Meyer – Machine Learning Group ( Universite Libre de Bruxelles)Thursday, December 13, 2012 – 10:30Room AurignyTalk abstract:Dans le travail prĂ©sentĂ©, nous proposons un nouvel algorithme pour l’orientation des arcs d’un rĂ©seau infĂ©rĂ© sur des donnĂ©es d’expression. L’information sur l’orientation des arcs ne provient ni de sĂ©ries temporelles, ni d’une connaissance apriori sur les facteurs de transcription.L’intĂ©rĂŞt de l’approche est donc de permettre de retrouver les facteurs de transcription d’une cellule rien qu’Ă partir de donnĂ©es d’expressions.Notre heuristique peut gèrer des rĂ©seaux de plusieurs milliers de variables.
-
Vers une exploration mathématique et algorithmique des interactions intra- et inter-organismes.
Marie France Sagot (INRIA, Lyon)Thursday, November 29, 2012 – 10:30Room AurignyTalk abstract:Dans cette prĂ©sentation je dresserai un panorama des activitĂ©s conduites dans l’Ă©quipe Inria Bamboo Ă Lyon.
-
“De la sĂ©quence du gĂ©nome du pommier au dĂ©cryptage de son fonctionnement” & “Biologie des communautĂ©s microbiennes associĂ©es aux semences”
Jean Pierre Renou & Marie Agnès Jacques (INRA / Agrocampus-ouest / UniversitĂ© d’Angers)Thursday, November 22, 2012 – 10:30Room AurignyTalk abstract:De la sĂ©quence du gĂ©nome du pommier au dĂ©cryptage de son fonctionnementLe gĂ©nome du pommier a Ă©tĂ© rĂ©cemment sĂ©quencĂ© (Velasco et al. 2010 Nat. Genet.). Cette première sĂ©quence, bien qu’imparfaite, Ă©tend considĂ©rablement le champ des recherches possibles dans cette espèce, en particulier pour identifier les gènes impliquĂ©s dans divers caractères d’intĂ©rĂŞt et Ă©lucider leurs fonctions. Nous avons cependant dĂ©cidĂ© d’affiner l’annotation fonctionnelle du gĂ©nome par sĂ©quençage haut-dĂ©bit de transcrits (RNAseq), qui nous a permis de prĂ©ciser le positionnement et la longueur des unitĂ©s de transcription, d’identifier de nouveaux transcrits, des occurrences de splicing alternatif et mĂŞme de trans-splicing. Nous avons ensuite conçu un microarray (technologie Nimblegen) couvrant tous les transcrits annotĂ©s du pommier pour procĂ©der Ă des etudes de transctiptome sur de grandes sĂ©ries d’Ă©chantillons. Une de ces premières Ă©tudes fut l’Ă©laboration un atlas d’expression des gènes dans tous les organes Ă titre de rĂ©fĂ©rence. Les donnĂ©es sont rendues accessibles via le gĂ©nome browser du “Genome Database for Rosaceae”. Les sondes ont Ă©tĂ© sĂ©lectionnĂ©es sur la base de la meilleure spĂ©cificitĂ© d’expression, avec un choix de couvrir chaque locus avec une sonde sens et son Ă©quivalent anti sens. De façon surprenante environ 65% des gènes exprimĂ©s prĂ©sentent une transcription anti-sens, avec une grande diversitĂ© en fonction des familles de gènes concernĂ©es et des organes Ă©tudiĂ©s. Un sĂ©quençage de petits ARN a enfin rĂ©vĂ©lĂ© une parfaite corrĂ©lation entre la prĂ©sence simultanĂ©e de transcrits sur les deux brins et le nombre de petits ARN produits sur un locus donnĂ©, indiquant une probable rĂ©gulation nĂ©gative post-transcriptionnelle par des “natural antisense Si-RNAs” d’une majoritĂ© des gènes. Ce rĂ©sultat, validĂ© par QRT-PCR directionnelle sur quelques gènes, pose la question de l’exacte signification des mesures d’expression des gènes, habituellement rĂ©alisĂ©es par QRT-PCR non directionnelles, qui rapportent en fait le niveau d’expression totale des deux transcrits en sens opposĂ©. Biologie des communautĂ©s microbiennes associĂ©es aux semencesLa semence est le vecteur d’une microflore diversifiĂ©e comprenant entre autres des microorganismes pathogènes des plantes mais Ă©galement de l’homme ainsi que des bactĂ©ries bĂ©nĂ©fiques au dĂ©veloppement des plantes. Les sources d’inoculum de la semence et le rĂ´le de cette semence dans l’établissement des communautĂ©s microbiennes associĂ©es aux plantules sont mĂ©connues. La graine, de par ses caractĂ©ristiques particulières (organe dĂ©shydratĂ©), reprĂ©sente un environnement sĂ©lectif pour la microflore. L’objectif principal du projet que nous dĂ©veloppons est d’Ă©tablir des liens fonctionnels entre la microflore de la graine et celles de la phyllosphère et de la plantule pour Ă©clairer sur le rĂ´le de la phyllosphère en tant que source de contamination de la graine, le rĂ´le de la graine en tant que source d’inoculum pour la plantule et d’identifier les fonctions propres aux bactĂ©ries associĂ©es aux semences qui leur permettent de survivre dans cet environnement extrĂŞme caractĂ©risĂ©, entre autres, par son faible niveau d’hydratation.
-
Formal model reduction
Jerome Feret (ENS ULM, Paris)Thursday, November 15, 2012 – 10:30Room AurignyTalk abstract:Combinatorial explosion of protein states generated by post-translational modifications and complex formation. Rule-based models provide a powerful alternative to approaches that require an explicit enumeration of all possible molecular species of a system. Such models consist of formal rules stipulating the (partial) contexts for specific protein-protein interactions to occur. These contexts specify molecular patterns that are usually less detailed than molecular species. Yet, the execution of rule-based dynamics requires stochastic simulation, which can be very costly. It thus appears desirable to convert a rule-based model into a reduced system of differential equations by exploiting the lower resolution at which rules specify interactions.In this talk, we present a formal framework for constructing coarse-grained systems. We track the flow of information between different regions of chemical species, so as to detect and abstract away some useless correlations between the state of sites of molecular species.The result of our abstraction is a set of molecular patterns, called fragments, and a system which describes exactly the concentration (or population) evolution of these fragments. The method never requires the execution of the concrete rule-based model and the soundness of the approach is described and proved by abstract interpretation.
-
EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer) EVA (Exome Variation Analyzer)
Thierry Lecroq (LITIS, Rouen)Thursday, October 25, 2012 – 10:30Room AurignyTalk abstract:In this talk I will describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical Whole Exome Sequencing. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data coming from Next Generation Sequencing, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. I will report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease.
-
Functional Analysis and Comparison of Gene Sets: Benefits of Using Semantic Similarity for Clustering DAVID Results
Olivier Dameron (INSERM, Rennes 1) & FrĂ©dĂ©ric HĂ©rault (INRA, UMR PEGASE)Thursday, October 11, 2012 – 10:30Room AurignyTalk abstract:Functional analysis of a set of genes consists in identifying its underlying biological features and is a challenging task. DAVID generates a list of enriched Gene Ontology (GO) terms and groups similar annotations into clusters ranked according to their enrichment score. However, its limitations are two-folds: it produces a lot of clusters of GO terms and it ignores the underlying semantics of Gene Ontology between these terms. We hypothesize that leveraging the semantics of Gene Ontology addresses both the quantity and the redundancy problems of DAVID and improves the functional analysis and comparisonof sets of genes. We propose to compute the semantic similarity of the clusters of GO terms returned by DAVID and to use it to group similar clusters. We applied this approach on two sets of genes from a porcine muscular transcriptome study. To analyze a set of genes, it reduced the number of clusters respectively from twelve to four and from seventeen to five “super clusters”. These “super clusters” correspond respectively to four and four biologically-relevant processes. To compare the sets of genes, our approach successfully identified three similar functions shared by the two sets, as well as one significant function related to nucleic acid metabolic process that was specific to the second set. These results show that post-processing DAVID results using semantic similarity-based hierarchical clustering is relevant for the functional analysis and comparison of large sets of genes. Keywords: Functional gene analysis, Gene Ontology, semantic similarity, hierarchical clustering.
-
A combinatorial and integrated method to analyse RNA-seq reads
Nicolas Philippe (Équipe MAB, Lirmm, Montpellier)Thursday, September 20, 2012 – 10:30Room AurignyTalk abstract:RNA sequencing enables a complete investigation covering the full dynamic spectrum of a transcriptome. It thus paves the way to a better understanding of the function of gene expression in different tissues, during development or pathological states. However, the splicing process, which generates both co-linear and non co-linear RNAs, the inclusion of sequencing errors, somatic mutations, polymorphisms, and rearrangements make the reads differ from the reference genome in a variety of ways. This complicates the task of comparing reads with a genome. Currently, the analysis paradigm consists in: 1. mapping the reads to a reference genome contiguously allowing as many differences as one expects to be necessary to accommodate sequence errors and small polymorphisms;2. using uniquely mapped reads to determine covered genomic regions, either for computing a local coverage to predict mutations and filter out sequence errors (cf. program ERANGE), or for delimiting expressed exons approximately (cf. program TopHat);3. re-aligning unmapped reads, which were not mapped contiguously at step one, to reveal splicing junctions. Limitations of this approach include lack of precision, redundant computations due to multi-mapping steps, error propagation due to heuristics and the absence of back-tracking. We propose a novel, integrated approach to analyze nowadays longer reads (> 50 bp). The idea is to adopt a k-mer approach that combines the genomic positions and local coverage to perform a complex analysis of each read and detect in a single step, mutations, indels, errors, as well as both normal and chimeric splice junctions. Comparisons with other tools demonstrate the feasibility of this approach, which yields both sensitive and highly specific inferences.
-
High-Throughput Transcriptomics
Micha Sammeth, (Centre Nacional d’AnĂ lisi Genòmica (CNAG), Barcelona)Thursday, June 14, 2012 – 11:30Room AurignyTalk abstract:In the seminar I will provide an introduction to the fascinating technique called RNA-Seq; after a brief historical overview about complementary techniques used earlier, we will review elementary preprocessing steps of these experiments. Then I will outline several different applications of RNA-Seq, and I will summarize some possible ways to analyze the data avoiding known pitfalls.
-
Vers un modèle de fonctionnement d’une communautĂ© bactĂ©rienne de sĂ©diments marins polluĂ©s par de l’arsenic
FrĂ©dĂ©ric Plewniak (G.M.G.M – UdS/CNRS UMR7156 Strasbourg)Thursday, May 31, 2012 – 10:30Room AurignyTalk abstract:L’arsenic, Ă l’origine d’importantes pollutions de l’eau dans des zones industrielles et post-industrielles du monde entier, prĂ©sente des risques sanitaires graves pour les populations. Les techniques de gĂ©nomique environnementale permettent aujourd’hui d’Ă©tudier les stratĂ©gies adaptatives et coopĂ©ratives des communautĂ©s microbiennes des milieux polluĂ©s.Nous avons sĂ©quencĂ© des mĂ©tagĂ©nome issus des sĂ©diments portuaires de l’Estaque, proche d’un ancien site mĂ©tallurgique hautement polluĂ© par l’arsenic près de Marseille, et de St Mandrier, près de Toulon. L’analyse Ă l’aide du protocole RAMMCAP des sĂ©quences obtenues a permis d’Ă©tablir les profils fonctionnels et taxonomiques des deux mĂ©tagĂ©nomes et de quatre mĂ©tagĂ©nomes tĂ©moins disponibles dans les banques de donnĂ©es publiques.La biodiversitĂ© est plus importante dans les deux communautĂ©s sĂ©dimentaires par rapport Ă celles, dominĂ©es Ă plus de 80% par deux ordres, des sites tĂ©moins. L’ordre des Desulfobacterales reprĂ©sente 54.7% Ă l’Estaque et 31.7% Ă St Mandrier, tous les autres ordres prĂ©sents Ă©tant rĂ©partis de manière relativement Ă©quitable. Toutefois la diversitĂ© microbienne est un peu plus importante Ă St Mandrier que sur le site hautement polluĂ© de l’Estaque.Les ensembles Gene Ontology (GO) dĂ©crivant les profils fonctionnels ont Ă©tĂ© comparĂ©s afin de mettre en Ă©vidence les catĂ©gories sur-reprĂ©sentĂ©es dans les deux mĂ©tagĂ©nomes d’intĂ©rĂŞt par rapport aux quatre tĂ©moins. On observe ainsi une sur-reprĂ©sentation des catĂ©gories liĂ©es Ă la rĂ©sistance Ă l’arsenic et aux rĂ©ponses au stress oxydatif Ă l’Estaque. De plus, les donnĂ©es de mĂ©tagĂ©nomique et les mesures physico-chimiques des paramètres environnementaux ont permis de proposer un modèle descriptif de fonctionnement des communautĂ©s procaryotiques mettant en Ă©vidence l’importance du cycle du soufre dans la dĂ©toxication de l’arsenic en relation avec la prĂ©sence de bactĂ©ries rĂ©ductrices du sulfate.
-
Fast and Accurate RNA-Seq read alignments with PALMapper
GĂ©raldine Jean (LINA, UniversitĂ© de Nantes)Thursday, May 10, 2012 – 10:30Room MinquiersTalk abstract:High throughput sequencing of mRNA enhances transcriptome analysis and offers great opportunities for the discovery of new genes and the identification of alternative transcripts. However, the sheer amount of high throughput sequencing data requires efficient methods for accurate spliced alignments of reads against the reference genome, which is further challenged by the limited length and quality of the sequence reads.In this talk, I will present an original RNA-Seq read mapper, called PALMapper, that combines a faster extension of the high accurate alignment method QPALMA with the fast short read aligner GenomeMapper. PALMapper quickly carries out an initial read mapping which then guides a Banded Semi-Global alignment algorithm that allows for long gaps corresponding to introns. It computes both spliced and unspliced alignments at high accuracy by taking advantage of base quality information and computational splice site predictions brought together in an extended alignment scoring model.
-
LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure
Nicolas Bonnel (UniversitĂ© Bretagne Sud)Thursday, May 3, 2012 – 10:30Room AurignyTalk abstract:In the last two decades, a lot of protein 3D shapes have been discovered, characterized and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. We present in this paper an approach entitled LNA (Laplacian Norm Alignment) that performs structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates at various scales local deformations of the topology where each residue is located. On some benchmarks widely shared by the community we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 seconds with a single recent GPU, which makes our algorithm very scalable and suitable for real-time database querying across the Web.
-
Métagénomique humaine : impacts cliniques
Nicolas Pons (INRA Jouy en Josas)Thursday, April 26, 2012 – 10:30Room AurignyTalk abstract:La mĂ©tagĂ©nomique humaine consiste Ă caractĂ©riser les associations entre les espèces et gènes microbiens et les phĂ©notypes humains afin de dĂ©velopper des outils diagnostiques et pronostiques et des approches de modulation des populations microbiennes dans le but d’optimiser la santĂ© de chacun. Les Ă©tudes de mĂ©tagĂ©nomiques ont Ă©tĂ© facilitĂ©es ces dernières annĂ©es avec le dĂ©veloppement des technologies de sĂ©quençage et de criblage Ă très-haut dĂ©bit. Dans ce sĂ©minaire, il sera prĂ©sentĂ© les quatre grands volets de la mĂ©tagĂ©nomique : mĂ©tagĂ©nomique fonctionnelle, mĂ©tagĂ©nomique phylogĂ©nĂ©tique, mĂ©tagĂ©nomique dite “whole sequencing” et mĂ©tagĂ©nomique quantitative. Il sera portĂ© une plus grande attention sur les deux derniers volets avec une illustration dĂ©taillĂ©e des derniers rĂ©sultats obtenus dans les projets MicroObese et MetaHIT visant notamment Ă identifier les associations entre populations microbiennes et obĂ©sitĂ©.
-
see year: 2020 – 2019 – 2018 – 2017 – 2016 – 2015 – 2014 – 2013 – 2012