Genome graphs have proved to be a more compact and efficient way of representing genetic inter- and intra-individual variability. Although they overcome the traditional sequence-based genome references in many use cases, analyzing genome graphs introduces new computational challenges. The workhorse of graph-based genome analysis is the sequence-to-graph alignment process, which consists of finding the path in the graph that better represents a query sequence. This search is highly computationally intensive, and different solutions have been proposed to solve it efficiently, either by adapting sequence-to-sequence strategies or exploiting novel graph-specific algorithms. However, comparing sequence-to-graph alignment tools is quite challenging because of the complexity and relative novelty of this task, and the resulting lack of standardization. Therefore, here we propose a methodology for a comprehensive and structured comparison of such tools. First, we define a set of KPIs for the qualitative analysis of an aligner's usability, accuracy, and performance. Then, we introduce the first open-source(1) benchmark suite for the quantitative analysis of multiple sequence-to-graph aligners. We test the proposed methodology on state-of-the-art tools, proving how it easily provides valuable insights about the compared aligners. Finally, we conclude the paper by drawing some guidelines to drive the improvement of this promising research field.
A Novel Methodology for a Comprehensive Analysis of Genomic Sequence-to-Graph Alignment Tools
Coggi, Mirko;Di Donato, Guido Walter;Santambrogio, Marco Domenico
2025-01-01
Abstract
Genome graphs have proved to be a more compact and efficient way of representing genetic inter- and intra-individual variability. Although they overcome the traditional sequence-based genome references in many use cases, analyzing genome graphs introduces new computational challenges. The workhorse of graph-based genome analysis is the sequence-to-graph alignment process, which consists of finding the path in the graph that better represents a query sequence. This search is highly computationally intensive, and different solutions have been proposed to solve it efficiently, either by adapting sequence-to-sequence strategies or exploiting novel graph-specific algorithms. However, comparing sequence-to-graph alignment tools is quite challenging because of the complexity and relative novelty of this task, and the resulting lack of standardization. Therefore, here we propose a methodology for a comprehensive and structured comparison of such tools. First, we define a set of KPIs for the qualitative analysis of an aligner's usability, accuracy, and performance. Then, we introduce the first open-source(1) benchmark suite for the quantitative analysis of multiple sequence-to-graph aligners. We test the proposed methodology on state-of-the-art tools, proving how it easily provides valuable insights about the compared aligners. Finally, we conclude the paper by drawing some guidelines to drive the improvement of this promising research field.| File | Dimensione | Formato | |
|---|---|---|---|
|
A_Novel_Methodology_for_a_Comprehensive_Analysis_of_Genomic_Sequence-to-Graph_Alignment_Tools.pdf
accesso aperto
:
Publisher’s version
Dimensione
1.68 MB
Formato
Adobe PDF
|
1.68 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


