Assembly and phasing of long reads using pangenome graph alignment

Francesco Andreace (Institut Pasteur)

05/06/2025 10:30 - 12:00
Emplacement: Aurigny Room


The talk will discuss a recently developed and still unpublished pangenome-based method. The goal is to assemble and phase long reads (both Nanopore R10 and PacBio HiFi) using information produced by aligning them to a pangneome variation graph. This method derives from two intuitions: the reads traverse different alleles in the bubbles (regions of variation) of the graph, inducing a natural phasing. Moreover, by recording the traversed alleles of each read, it is possible to compute ‘anchor-points’ between reads to be fed to an assembly software like Shasta.
This method is therefore part of what would be a pipeline that after aligning reads to a pangenome can also produce a fairly good quality assembly, especially for medically relevant regions.
This work was started during a visit at UCSC in Benedict Paten lab, together with Adam Novak, Paolo Carnevali, Shloka Negi and Benedict Paten.

For internal attendees.