Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. The existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced-interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is considerably faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time needed to construct RI-DFA vs DFA is moderate and is compatible with a practical use.
Minimizing speculation overhead in a parallel recognizer for regular texts
Luca Breveglieri;Angelo Morzenti;Stefano Crespi Reghizzi
2025-01-01
Abstract
Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. The existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced-interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is considerably faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time needed to construct RI-DFA vs DFA is moderate and is compatible with a practical use.File | Dimensione | Formato | |
---|---|---|---|
main-poster-PPoPP-formato-A1-ver-7.pdf
accesso aperto
Descrizione: poster
:
Altro materiale allegato
Dimensione
2.06 MB
Formato
Adobe PDF
|
2.06 MB | Adobe PDF | Visualizza/Apri |
main-PPoPP-paper.pdf
Accesso riservato
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
134.72 kB
Formato
Adobe PDF
|
134.72 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.