(archives from old symbiose site)
-
Soutenance de thÚse de Gaëlle Garet
GaĂ«lle GaretTuesday, December 16, 2014 – 10:00Room MarkovTalk abstract:TBA
-
Soutenance de thĂšse de Sylvain Prigent
Sylvain PrigentFriday, November 14, 2014 – 14:00Room MĂ©tivierTalk abstract:ComplĂ©tion combinatoire pour la reconstruction de rĂ©seaux mĂ©taboliques, et application au modĂšle des algues brunes Ectocarpus siliculosus.
Durant cette thĂšse nous nous sommes attachĂ©s au dĂ©veloppement d’une mĂ©thode globale de crĂ©ation de rĂ©seaux mĂ©taboliques chez des espĂšces biologiques non classiques pour lesquelles nous possĂ©dons peu d’informations. Classiquement cette reconstruction s’articule en trois points : la crĂ©ation d’une Ă©bauche mĂ©tabolique Ă partir d’un gĂ©nome, la complĂ©tion du rĂ©seau et la vĂ©rification du rĂ©sultat obtenu. Nous nous sommes particuliĂšrement intĂ©ressĂ© au problĂšme d’optimisation combinatoire difficile que reprĂ©sente l’Ă©tape de complĂ©tion du rĂ©seau, en utilisant un paradigme de programmation par contraintes pour le rĂ©soudre : la programmation par ensemble rĂ©ponse (ou ASP). Les modifications apportĂ©es Ă une mĂ©thode prĂ©existante nous ont permis d’amĂ©liorer Ă la fois le temps de calcul pour rĂ©soudre ce problĂšme combinatoire et la qualitĂ© de la modĂ©lisation.
L’ensemble de ce processus de reconstruction de rĂ©seau mĂ©tabolique a Ă©tĂ© appliquĂ© au modĂšle des algues brunes, Ectocarpus siliculosus, nous permettant ainsi de reconstruire le premier rĂ©seau mĂ©tabolique chez une macro-algue brune. La reconstruction de ce rĂ©seau nous a permis d’amĂ©liorer notre comprĂ©hension du mĂ©tabolisme de cette espĂšce et d’amĂ©liorer l’annotation de son gĂ©nome.Rapporteurs :– Marie Beurton-Aimar, MaĂźtre de confĂ©rences, universitĂ© de Bordeaux 2– Hubert Charles, Professeur, INSA Lyon– Claudine MĂ©digue, Directrice de recherche CNRS, CEA-GĂ©noscopeJury :– Alexander Bockmayr, Professeur, Freie UniversitĂ€t, Berlin– Arnaud Martin, Professeur, universitĂ© de Rennes 1– Anne Siegel, Directrice de recherches CNRS (directrice de thĂšse)– Thierry Tonon, MaĂźtre de confĂ©rences, UMPC (co-directeur de thĂšse) -
Soutenance de thĂšse de Valentin Wucher
Valentin Wucher (INRA/Irisa)Monday, November 3, 2014 – 14:00Room MĂ©tivierTalk abstract:ModĂ©lisation dâun rĂ©seau de rĂ©gulation dâARN pour prĂ©dire des fonctions de gĂšnes impliquĂ©s dans le mode de reproduction du puceron du pois
Résumé
Cette thĂšse cherche Ă discriminer au niveau gĂ©nomique entre le dĂ©veloppement d’embryons vers un mode de reproduction sexuĂ© et le dĂ©veloppement vers un mode asexuĂ© chez le puceron du pois, Acyrthosiphon pisum. Cette discrimination passe par la crĂ©ation du rĂ©seau de rĂ©gulation post-transcriptionnelle des microARN et des ARNm qui possĂšdent des cinĂ©tiques d’expression diffĂ©rentes entre ces deux embryogenĂšses ainsi que par l’analyse des modules d’interactions de ce rĂ©seau par l’utilisation de l’analyse de concepts formels. Pour ce faire, une stratĂ©gie en plusieurs Ă©tapes a Ă©tĂ© mise en place : la crĂ©ation d’un rĂ©seau d’interactions entre les microARN et les ARNm du puceron du pois ; l’extraction et la rĂ©duction du rĂ©seau aux microARN et ARNm qui possĂšdent des cinĂ©tiques diffĂ©rentes entre les deux embryogenĂšses Ă partir des donnĂ©es d’expression tirĂ©es du sĂ©quençage haut-dĂ©bit ; l’analyse du rĂ©seau d’interactions rĂ©duit aux Ă©lĂ©ments dâintĂ©rĂȘt par l’analyse de concepts formels. L’analyse du rĂ©seau a permis l’identification de diffĂ©rentes fonctions potentiellement importantes comme l’ovogenĂšse, la rĂ©gulation transcriptionnelle ou encore le systĂšme neuroendocrinien. En plus de l’analyse du rĂ©seau, l’analyse de concepts formels a Ă©tĂ© utilisĂ©e pour dĂ©finir une mĂ©thode de rĂ©paration de graphe biparti basĂ©e sur une topologie en concepts ainsi qu’une mĂ©thode de visualisation de graphes bipartis par ses concepts.
Modeling of a gene network between mRNAs and miRNAs to predict gene functions involved in phenotypic plasticity in the pea aphid
Abstract
This thesis aims to discriminate between embryos development towards either sexual or asexual reproduction types in pea aphids, Acyrthosiphon pisum, at the genomic level. This discrimination involves the creation of a post-transcriptional regulation network between microRNAs and mRNAs whose kinetic expressions change depending on the embryogenesis. It also involves a study of this network’s interaction modules using formal concept analysis. To do so, a three-step strategy was set up. First the creation of an interaction network between the pea aphid’s microRNAs and mRNAs. The network is then reduced by keeping only microRNAs and mRNAs which possess differential kinetics between the two embryogeneses, these are obtained using high-throughput sequencing data. Finally the remaining network is analysed using formal concept analysis. Analysing the network allowed for the identification of several functions of potential interest such as oogenesis, transcriptional regulation or even neuroendocrine system. In addition to network analysis, formal concept analysis was used to create a new method to repair a bipartite graph based on its topology and a method to visualise a bipartite graph using its formal concepts.-
ModĂ©lisation de la dynamique des rĂ©seaux de signalisation SBGN-AF Ă l’aide de programmes logiques normaux.
Adrien Rougny (LRI)Thursday, October 16, 2014 – 10:30Room MinquiersTalk abstract:Un grand nombre de rĂ©seaux de signalisation sont disponibles dans la littĂ©rature ou dans des bases de donnĂ©es sous forme de graphes d’intĂ©ractions. Afin de comprendre les systĂšmes sous-jacents Ă ces rĂ©seaux et de pouvoir les modifier dans un but principalement mĂ©dical, il est nĂ©cessaire de comprendre leur comportement dynamique. C’est pourquoi un grand nombre de techniques de modĂ©lisation de la dynamique de ces rĂ©seaux molĂ©culaires ont Ă©tĂ© dĂ©veloppĂ©es. Il est notamment possible de modĂ©liser la dynamique de ces systĂšmes par des rĂ©seaux boolĂ©ens. La construction de ces rĂ©seaux boolĂ©ens Ă partir de graphes d’intĂ©ractions nĂ©cessite une paramĂ©trisation des fonctions boolĂ©ennes, le plus souvent rĂ©alisĂ©e Ă partir de l’interprĂ©tation de rĂ©sultats expĂ©rimentaux. Nous exposerons dans cette prĂ©sentation une mĂ©thode de paramĂ©trisation rĂ©alisĂ©e sans donnĂ©es expĂ©rimentales mais Ă partir de principes biologiques gĂ©nĂ©raux rĂ©gissant la dynamique des rĂ©seaux de signalisation. Dans notre mĂ©thode, les rĂ©seaux boolĂ©ens sont exprimĂ©s sous forme de programmes logiques normaux, Ă partir desquels leurs Ă©tats stationnaires et trajectoires sont calculĂ©s.
-
Modeling dynamics of cell-to-cell variability in TRAIL-induced apoptosis explains fractional killing and predicts reversible resistance
Gregory Batt (INRIA, Rocquencourt)Thursday, October 9, 2014 – 10:30Room MinquiersTalk abstract:TRAIL induces apoptosis selectively in cancer cells and is currently tested in clinics. Having a mechanistic understanding of TRAIL resistance could help to limit its apparition. Several observations suggested that protein level fluctuations play an important role in TRAIL resistance and its acquisition. However, quantitative, systems-level approaches to investigate their role in cellular decision-making processes are lacking. We propose a generic and principled approach to extend signal transduction models with protein fluctuation models for all proteins in the pathway. The key aspect is to use standard protein fluctuation models for long-lived proteins. We show that its application to TRAIL-induced apoptosis provide a quantitative, mechanistic explanation to previously published but yet unexplained critical observations.
-
Annoyances in metagenomic data analysis and interpretation
Thomas Bruls (Genoscope)Thursday, September 25, 2014 – 10:30Room MinquiersTalk abstract:Sustained developments in sequencing technology have fueled a range of new applications in various fields of life sciences, among which metagenomics (aka community genomics or environmental genomics), whose promise is to deliver deeper insights into the so-called "unseen majority". Beyond issues common to other data intensive applications, metagenomics faces difficulties arising from the "in situ" structure of microbial communities, which hinder the possibility of generating accurate assemblies from even moderately complex metagenomes. These limitations have prompted or renewed interest in assembly-free methods for sequence analysis, which are nowadays intensively studied from both statistical and algorithmic point of views. In this talk, we will discuss and illustrate through various real world examples (and some less real ones) how such methods, including so-called "binning" methods, can help to increase biological interpretability of metagenome datasets. We will also sketch the layout of a software development project that aims at scalable variable selection for biomarker discovery from large and complex metagenomic datasets. It involves a nested clustering procedure combining two types of features extracted from the sequences: a coverage related signal (captured using long k-mers), and a composition related one (captured using shorter k-mers). If we have time left, we will evoke an intriguing spectral algorithm that has roots in the pre-genomic era, e.g. was succesfully applied in the context of physical mapping efforts, and that could be amenable to solve some sequence assembly problems.
-
Swarm: robust and fast clustering method for amplicon-based studies
FrĂ©dĂ©ric MahĂ© (Department of Ecology Technische UniversitĂ€t Kaiserslautern )Thursday, September 4, 2014 – 10:30Room MinquiersTalk abstract:Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units, improving the amount of meaningful biological information that can be extracted from amplicon-based studies.
-
Arthropod Genome Sequencing at the Baylor College of Medicine Human Genome Sequencing Center.
Stephen Richards (Baylor College of Medicine Human Genome Sequencing Center.)Thursday, June 26, 2014 – 10:30Room AurignyTalk abstract:We have long been pioneered the sequencing of insects genomes, from Drosophila melanogaster to Aphids, Beetles and Centipedes.As decreasing sequencing costs have allowed, we are expanding our investigations to the phylum of Arthropods. As a pilot for the insect 5,000genomes project, we are sequencing a pilot of 30 arthropod genomes, to identify practical issues and solutions for the selection, DNA isolation, sequencing, assembly, annotation, analysis and publication of multiple arthropod genomes.Here we describe examples demonstrating the power of the de-novo genome to drive biology, and the successes, problems and lessons learned so far from our pilot project. We also present the automated annotation pipeline used for the project. We hope that this project will inform larger projects in the future.
-
Deciphering respective genome wide roles of bacteria within a community responsible for copper bioleaching metabolic processes: an integrative systems ecology approach
Philippe Bordon (Univ. of Chile)Thursday, May 22, 2014 – 10:30Room AurignyTalk abstract:Bioleaching process consists in the extraction of metals from ores through the cooperative participation of several extremophile microorganisms. Due to its great industrial interest, different studies have extensively focused on identifying isolated contributions of single strains to the process. Even though these studies achieved important advances, the functioning of a bioleaching consortium as a whole remains far from being understood. From a holistic perspective, this presentation proposes a novel integrative systems ecology approach that aims to give a functional sense to a metagenomic consortium through the integration of genomic and metabolic knowledge at genome scale. Using public genome data of five bacterial strains involved in copper bioleaching: Acidiphilium cryptum, Acidithiobacillus ferrooxidans, Acidithiobacillus thiooxidans, Leptospirillum ferriphilum and Sulfobacillus thermosulfidooxidans, we first reconstructed a global integrative metabolic network. Next, using a parsimony assumption, we decipher a set of genes, called SGS, that take an active part in metabolic pathways related to bioleaching and are consecutive on their respective genomes, adding the constraint that the associated metabolic reactions are also closely connected within metabolic networks. Finally, SGS analysis showed that no segment is shared by five bacteria, suggesting that no single organism allows alone the copper bioleaching, but also pinpoints to the combination of bacterial interactions necessary for promoting these pathways, as well as the major hub role of A. cryptum. Overall, the SGS paradigm depicts genomic functional units and their respective role to maintain metabolic pathways, information that is crucial to genetically monitor bacterial participation as a whole in environmental processes
-
Enhancing reuse in scientific workflows
Sarah Cohen-Boulakia (LRI, UniversitĂ© Paris-Sud)Thursday, May 15, 2014 – 10:45Room AurignyTalk abstract:Scientific workflows have been introduced to enhance reproducibility, share and reuse of in-silico experiments. Their simple programming model appeals to bioinformaticians, who can use them to specify complex data processing pipelines.
In this talk, I will first present the results of a study we performed on workflow (re)use based on a large set of public scientific workflows: While the number of available scientific workflows is increasing along with their popularity, workflows are not (re)used and shared as much as they could be.
I will then present several projects which aim at enhancing workflow reuse while focusing more specifically on the recent DistillFlow project. DistillFlow proposes to reduce the structural complexity of workflows to make workflows easier to understand for users. The refactoring approach followed in DistillFlow has provided very interesting results both in the 1,500 public workflows from myexperiment.org and on the more curated workflow sets from the BioVel project (workflows to analyze biodiversity data).
-
Inférence des voies métaboliques chez les espÚces non-modÚles: de la génomique à la métabolomique
Gabriel Markov (Tuebingen)Tuesday, April 15, 2014 – 10:30Room AurignyTalk abstract:Actuellement, pour savoir si une voie mĂ©tabolique connue est prĂ©sente chez une espĂšce non-modĂšle, les bioinformaticiens se concentrent sur la recherche d’enzymes orthologues dans l’espĂšce modĂšle la plus proche. Souvent, la prĂ©sence de quelques enzymes orthologues est considĂ©rĂ©e comme une preuve suffisante de la conservation de la voie mĂ©tabolique d’intĂ©rĂȘt, mais ce raccourci n’est pas toujours justifiĂ©. Quelles sont les informations que fournit la gĂ©nomique comparative sur la conservation des voies mĂ©taboliques, et en quoi la mĂ©tabolomique s’avĂšre-t-elle un complĂ©ment indispensable pour l’Ă©tude Ă haut dĂ©bit de la diversitĂ© mĂ©tabolique chez les espĂšces non-modĂšles?
-
La prédiction du noyau du repliement des protéines globulaires
Jacques Chomilier (BiBiP, IMPMC, UniversitĂ© Pierre et Marie Curie, Paris)Thursday, April 10, 2014 – 10:30Room AurignyTalk abstract:Il existe plusieurs modĂšles pour dĂ©crire le repliement des protĂ©ines, câest Ă dire la formation dâun globule compact aprĂšs la synthĂšse de la chaĂźne peptidique dans le ribosome. Parmi ceux-ci, le modĂšle de nuclĂ©ation-condensation stipule que sous lâeffet de lâagitation thermique, des fluctuations du squelette mettent en contact des acides aminĂ©s rĂ©partis le long de la sĂ©quence. Ils constituent alors le noyau du repliement et nous nous intĂ©ressons Ă leur prĂ©diction Ă partir de la sĂ©quence, par une simulation du repliement dans un espace discret avec une technique de Monte Carlo. Nous avons appelĂ© MIR (Most Interacting Residues) les positions occupĂ©es par des acides aminĂ©s engagĂ©s dans un grand nombre de contacts non covalents. Leur comparaison avec les donnĂ©es expĂ©rimentales sera prĂ©sentĂ©e.
-
Formalisation de réseaux de signalisation en logique
Christine Froideveaux (LRI – INRIA AMIB – UniversitĂ© Paris Sud )Thursday, March 27, 2014 – 10:30Room AurignyTalk abstract:Dans la premiĂšre partie de l’exposĂ© nous prĂ©senterons une mĂ©thode basĂ©e sur la connaissance du domaine, qui permet de construire la topologie de rĂ©seaux molĂ©culairesen exploitant des donnĂ©es expĂ©rimentales et des rĂšgles gĂ©nĂ©rales de raisonnement fournies par des experts.Nous montrerons comment cette mĂ©thode appliquĂ©e Ă des rĂ©seaux de signalisation permet de dĂ©couvrir de nouvelles relations dans le rĂ©seau FSH.Dans une deuxiĂšme partie, nous introduirons une traduction du langage standard Systems Biology Graphical Notation Activity Flow (SBGN-AF) en programmation logique. Nous montrerons comment cette traduction peut ĂȘtre utilisĂ©e pour analyser la dynamique des rĂ©seaux SBGN-AF.
-
Operator-valued kernels for network inference
Florence d’AlchĂ©-Buc (UniversitĂ© dâEvry-Val dâEssonne)Thursday, March 20, 2014 – 10:30Room AurignyTalk abstract:Reverse engineering of gene regulatory networks remainsa central challenge in computational systems biology, despite recent advances facilitated by benchmark in-silico challenges that have aidedin calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time series data have appeared in the literature addressing this problem, with the latter employing linear temporal models.Nonlinear dynamical models are particularly appropriate for this inference task given the generation mechanism of the time series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns themodel parameters, as well as the network structure. As all kernel-based methods, this new model benefits from the regularization framework and a great flexibility. The empirical estimation of the model’s Jacobian matrix provides an estimation of the network structure.We propose a new learning method based on boosting.The performance of the proposed algorithm is evaluated on a number of benchmark data sets from the DREAM3 challenge and then, on real datasets related to the IRMA and T-cell networks.
-
A framework based on probabilistic context-free grammars and a genetic algorithm for analysis of protein sequences
Witold Dyrka (Inria Bordeaux)Thursday, February 27, 2014 – 10:30Room AurignyTalk abstract:Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. To address this problem, we have developed a probabilistic grammatical framework for problem-specific protein languages. The core of the model consists of a probabilistic context-free grammar (PCFG), automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training sequences represented by physico-chemical properties. We tested the PCFG framework in the context of detection of ligand binding sites [1] and classfication of helixâhelix contact sites, where it outperformed the state-of-the-art [2]. Recently, we used the model to distinguish between amyloidogenic and non-amyloidogenic protein fragments and achieved good results (AUROC up to 0.80). A significant feature of the PCFG approach is the explanatory power of grammar rules and parse trees, which could provide biologically meaningful information. This is a joint work with Jean-Christophe Nebel, Malgorzata Kotulska and Florence Thirion.
[1] Dyrka and Nebel. BMC Bioinformatics 2009, 10:323
[2] Dyrka et al. Algorithms for Molecular Biology 2013, 8:31 -
Beyond N-gram modelling of documents
Matthias GallĂ© (Xerox Grenoble)Thursday, February 6, 2014 – 10:30Room AurignyTalk abstract:The traditional way of modeling textual documents for text analytics is the bag-of-words or bag-of-ngrams approach. Besides the good performance of this lossy representation in machine learning applications it has some well known shortcomings due to the independence assumption of each n-gram.We propose an alternative representation based on repeated substrings of unbounded length (infinity-grams). In this talk we will show some applications, show how to overcome some computational challenges and will concentrate on the problem of recovering bigger chunks of texts when the only available information are n-grams.
-
San analogy between symbolic extensions and the multiscale structure of genomes
Alejandro Maass (University of Chile)Thursday, January 30, 2014 – 10:30Room AurignyTalk abstract:A genome of a living organism consists of a long string of symbols over a finite alphabet carrying critical information for the organism. This includes its ability to control post natal growth, homeostasis, adaptation to changes in the surrounding environment, or to biochemically respond at the cellular level to various specific regulatory signals. In this sense, a genome represents a symbolic encoding of a highly organized system of information whose functioning may be revealed as a natural multilayer structure in terms of complexity and prominence. In this talk we use the mathematical theory of symbolic extensions trying “to speculateâ a framework to shed light onto how this multilayer organization is reflected in the symbolic coding of the genome. The distribution of data in an element of a standard symbolic extension of a dynamical system has a specific form: the symbolic sequence is divided into several subsequences (which we call layers) encoding the dynamics on various “scales”. We propose that a similar structure resides within the genomes, building our analogy on some of the most recent findings in the field of regulation of genomic DNA functioning.
-
The evolution of tandem repeats in eukaryotic proteomes
Elke Schaper (Institute for Integrative Biology, ETH Zurich)Thursday, January 23, 2014 – 10:30Room AurignyTalk abstract:Tandem repeats (TRs) are a major element of protein and nucleic sequences in all domains of life. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained – perhaps providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture for proteins TRs, we performed a proteome-wide analysis of the mode of evolution of eukaryotic TRs.
In my talk, Iâll walk you through our analysis:
– What are the obstacles with genome wide TR detection & annotation, what can be done about it?– How to detect orthologous TRs unbiased to perform a comparative analysis?– How did we use TR unit phylogenies to discern the mode of evolution of TRs?– And finally the surprise: What were the results of our analyses – How fast do eukaryotic protein TRs evolve, and why is it so? -
Analogie Formelle : Proportion analogique, Analogie et Analyse Formelle de Concepts.
Nelly Barbot, Laurent Miclet (IRISA) et Henri Prade (IRIT)Thursday, January 16, 2014 – 10:30Room AurignyTalk abstract:
Ce travail a pour objet de définir comment la notion de proportion analogique
peut s’appliquer dans un treillis de concepts obtenu Ă partir d’un contexte formel.
Rappelons qu’une proportion analogique s’Ă©nonce : “a est Ă b comme c est Ă d”, avec quatre Ă©lĂ©ments de mĂȘme nature,
comme par exemple “le veau est au taureau ce que le poulain est Ă l’Ă©talon” pour quatre mammifĂšres.
La notion de proportion analogique est désormais bien explorée dans le cas des treillis booléens,
pour lesquels la propriété de distributivité induit de bonnes propriétés. Par exemple
le nombre de solutions Ă une Ă©quation analogique (trouver x tel que “a est Ă b comme c est Ă x”)
y est toujours 0 ou 1. De plus, si x existe, on sait le calculer explicitement.
Dans un treillis de concepts, il n’y a pas en gĂ©nĂ©ral de distributivitĂ© et pas de proportion analogique
au sens de la dĂ©finition dite de “factorisation” introduite par Yvon et Stroppa.
On propose donc une dĂ©finition plus faible (“Weak Analogical Proportion”),
qui s’Ă©nonce ainsi :
(a,b,c,d) sont en WAP ssi
(a join d) = (b join c) et (a meet d) = (b meet c).
On s’intĂ©resse en particulier aux cas oĂč les quatre concepts ne sont reliĂ©s par aucune relation d’inclusion.
Dans ce cas, on montre qu’il existe un contexte minimal produisant une WAP entre concepts.
On s’intĂ©resse aussi Ă la notion d’analogie qui s’Ă©nonce aussi
“a est Ă b comme c est Ă d”, mais avec seulement a et c d’une part, et b et d, d’autre part, de mĂȘme nature,
par exemple “la nageoire est au poisson ce que le l’aile est Ă l’oiseau”.
Il semble que le cadre des contextes formels permette de modéliser assez naturellement cette figure de style.
Les buts pratiques de cette Ă©tude concernent l’utilisation de la proportion analogique et de l’analogie
en raisonnement et en apprentissage, dans le cadre de la théorie des concepts formels. -
MACSE, MapNH et HomeoSplitter : 3 outils pour lâanalyse de sĂ©quences nuclĂ©otidiques codantes (exons, CDS)
Vincent Ranwez (Montpellier SupAgro)Thursday, January 9, 2014 – 10:30Room AurignyTalk abstract:Ce sĂ©minaire sera lâoccasion de prĂ©senter les bases mĂ©thodologiques et des exemples dâapplications (sur les mammifĂšres et sur le blĂ© dur) de 3 outils que nous avons rĂ©cemment dĂ©veloppĂ©s pour lâanalyse de sĂ©quences nuclĂ©otidiques codantes. Ces outils sont tous tĂ©lĂ©chargeables gratuitement : http://bioweb.supagro.inra.fr/homeoSplitter/, http://biopp.univ-montp2.fr/forge/testnh, http://bioweb.supagro.inra.fr/macse/ MACSE (Multiple Alignment of Coding SEquences) [1] est une solution algorithmique spĂ©cialement conçue pour lâalignement multiple de sĂ©quences nuclĂ©otidiques codantes. MACSE prend en compte la traduction en acides aminĂ©s de ces sĂ©quences pour les aligner tout en autorisant lâapparition de changements de cadre de lecture et de codons stop (Ranwez et al. 2011). LâintĂ©rĂȘt de MACSE rĂ©side dans sa capacitĂ© Ă aligner des sĂ©quences contenant des changements de cadre de lecture rĂ©els (ex. pseudogĂšnes) ou apparents (erreurs de sĂ©quençage). Puisquâils prĂ©servent le cadre de lecture, les alignements infĂ©rĂ©s par MACSE peuvent notamment ĂȘtre directement utilisĂ©s pour faire des Ă©tudes de dN/dS ou de ÏN/ ÏS.MapNH [2] permet dâinfĂ©rer, sur la base dâun modĂšle dâĂ©volution homogĂšne, les diffĂ©rents changements nuclĂ©otidiques qui se sont vraisemblablement produits sur chacune des branches dâune phylogĂ©nie [1]. MapNH permet ainsi dâobtenir des estimations de dN/dS aussi fiables quâavec PAML mais de maniĂšre beaucoup plus rapide.En utilisant MACSE et MapNH nous avons conduit une Ă©tude sur lâĂ©volution des CDS des gĂ©nomes de mammifĂšres qui indique que lâancĂȘtre des mammifĂšres Ă©tait vraisemblablement une espĂšce longĂ©vive [3]. Avec les outils de sĂ©quençage haut dĂ©bit, lâassemblage de CDS et la dĂ©tection de SNP sont devenues des taches relativement routiniĂšres pour les espĂšces diploĂŻdes. Elles restent cependant problĂ©matiques pour les espĂšces polyploĂŻdes, notamment suite aux confusions entre locus homĂ©ologues qui peuvent ĂȘtre assemblĂ©s de maniĂšre erronĂ©e en un seul contig.HomeoSplitter [4] utilise une approche par maximum de vraisemblance pour sĂ©parer efficacement de tels contigs chimĂ©riques en deux contigs homologues sur la base de leur diffĂ©rentiel dâexpression. Nous avons validĂ© HomeoSplitter sur des donnĂ©es RNAseq rĂ©elles issues de trente accessions de blĂ© dur (Triticum turgidum, tĂ©traploĂŻde contenant les gĂ©nomes A et B, 2n=4x=28). Les transcriptomes des espĂšces diploĂŻdes donneuses des gĂ©nomes Ă©lĂ©mentaires, Aegilops speltoides (proche du gĂ©nome B) et Triticum urartu (proche du gĂ©nome A) ont Ă©tĂ© utilisĂ©s comme Ă©lĂ©ment de comparaison afin de valider la mĂ©thode. HomeoSplitter constitue une solution pratique rĂ©solvant les problĂšmes de mĂ©lange des homĂ©o-gĂ©nomes pour les espĂšces allo-tĂ©traploĂŻdes, et permet une dĂ©tection des SNP plus performante chez ces espĂšces. 1. MACSE : Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. Vincent Ranwez, SĂ©bastien Harispe, FrĂ©dĂ©ric Delsuc, Emmanuel JP Douzery Plos One 6(9) : e22594.2. Fast and robust characterization of time-heterogeneous sequence evolutionary processes using substitution mapping. Jonathan Romiguier, Emeric Figuet, Nicolas Galtier, Emmanuel JP Douzery, Bastien Boussau, Julien Y Dutheil, Vincent Ranwez. Plos One (2012) 7(3): e33852.3. Genomic Evidence for Large, Long-Lived Ancestors to Placental Mammals. J. Romiguier, V. Ranwez, E.J.P. Douzery, N. Galtier. Molecular Biology and Evolution (2013) 30(1):5-134. Disentangling homeologous contigs in allo-tetraploid assembly: application to durum wheat. V Ranwez, Y Holtz, G Sarah, M Ardisson, S Santoni, S GlĂ©min, M Tavaud-Pirra . BMC Bioinformatics 14 (Suppl 15), S15 (RECOMB-CG 2013 special issue).
-
see year: 2020 – 2019 – 2018 – 2017 – 2016 – 2015 – 2014 – 2013 – 2012