A reference-free solution to the viral quasispecies assembly problem

Jasmijn Baaijens (Harvard Medical School)

18/02/2021 14:30 - 16:00

The viral quasispecies assembly problem deals with assembling fragments stemming from the RNA from the ensemble of virus strains that have populated individual patients, into full-length strain-resolved genomes. The challenge is that RNA viruses are affected by high mutation rates, which implies that very often strain-specific genomes substantially diverge from existing reference genomes. We developed a collection of tools that together provide a reference-free solution to the viral quasispecies assembly problem.  We show how to assemble strain-specific contigs using overlap graphs; then, we construct variation graphs from these contigs and define a flow-like optimization problem to build full-length, strain-specific genomes, along with estimates for their relative abundance. Benchmarking experiments show that our workflow outperforms state-of-the-art approaches on mixed samples from viral genomes in terms of assembly accuracy as well as abundance estimation. Experiments on longer, bacterial sized genomes demonstrate future applications also in bacterial genomics.