The Human Pangenome Reference Consortium (HPRC) proved that pangenome graphs represent a population's genetic variability more efficiently and accurately than linear references. Graphs can intrinsically encode variations as alternative paths inside a directed set of sequence nodes connected by edges. Despite their higher complexity, graph-based genome analysis pipelines are gaining significant interest, and the first sequence-to-graph aligners have already shown improvements in semi-global alignment. However, in pangenomics studies, the global alignment of long reads is fundamental for identifying structural variations and haplotype phasing. In this context, the Graph Wavefront Alignment (GWFA) algorithm emerged as the fastest strategy for aligning long reads to genomic graphs. However, the available GWFA implementation does not support alignment backtracking, a crucial feature in real-case studies. In this paper, we propose a new open-source1 implementation of the GWFA algorithm that computes and reports the complete traceback in the standard GAF format. Our work achieves a 20× speedup in execution time compared to the state-of-the-art tool GraphAligner and competitive memory usage.
On the optimization of GWFA algorithm: enabling real-case applications supporting alignment backtracking
Coggi, Mirko;Sgarlata, Antonio;Di Donato, Guido Walter;Santambrogio, Marco Domenico
2024-01-01
Abstract
The Human Pangenome Reference Consortium (HPRC) proved that pangenome graphs represent a population's genetic variability more efficiently and accurately than linear references. Graphs can intrinsically encode variations as alternative paths inside a directed set of sequence nodes connected by edges. Despite their higher complexity, graph-based genome analysis pipelines are gaining significant interest, and the first sequence-to-graph aligners have already shown improvements in semi-global alignment. However, in pangenomics studies, the global alignment of long reads is fundamental for identifying structural variations and haplotype phasing. In this context, the Graph Wavefront Alignment (GWFA) algorithm emerged as the fastest strategy for aligning long reads to genomic graphs. However, the available GWFA implementation does not support alignment backtracking, a crucial feature in real-case studies. In this paper, we propose a new open-source1 implementation of the GWFA algorithm that computes and reports the complete traceback in the standard GAF format. Our work achieves a 20× speedup in execution time compared to the state-of-the-art tool GraphAligner and competitive memory usage.| File | Dimensione | Formato | |
|---|---|---|---|
|
On_the_optimization_of_GWFA_algorithm_enabling_real-case_applications_supporting_alignment_backtracking.pdf
accesso aperto
:
Publisher’s version
Dimensione
304.8 kB
Formato
Adobe PDF
|
304.8 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


