MAGs reconstruction problem : a constraint clustering approach

Benjamin Chrucheward (université de Nantes)

08/10/2020 10:30 - 12:00
Emplacement: Aurigny Room

Constraint programming is a paradigm in which a problem modelization take the form of a set of rules named constraints, which must be satisfied in any possible solution of the problem. These constraints shall take different forms, like must-link/cannot-link constraints, or optimization criteria. Unlike imperative programming, the purpose is to modelize the problem as best as possible, while the user do not give any instruction to describe how the problem should be resolved ; the resolution is the solver’s task. In this presentation, I will talk about Metagenome-Assembled Genomes
(MAGs), which are individual genomes reconstructed from metagenomic data. In this problem, a metagenome is a pool of nucleotidic sequences (contigs) belonging to several genomes. The main step of this reconstruction, the binning, aims to cluster sequences belonging to the same genome together, and to not cluster sequences belonging to different genomes in the same cluster. In our model, genomes are represented as clusters of points (point being the sequences) sharing similarities in their nucleotidic composition and their relative abundance within the metagenomes. These metrics shall be represented as distances, thus the objective is to minimize intracluster distances, and maximize intercluster distances. The problem is implemented with ASP (Answer Set Programming) language. I will present preliminary results we obtained with toy sets samples, and troubles we faced with this modelization.