RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Parallel algorithms for regular language recognition have been widely experimented for various types of finite automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such algorithms cut the input string into finitely many sections or chunks, independently recognize each one in parallel by means of identical FAs, and at last join the results to check overall consistency. Each FA, except the first, needs to speculatively start a run assuming any state as initial, though most runs will abort or be rejected when joining the computations of adjacent chunks. The ensuing overhead may nullify the gain over a serial algorithm. Existing parallel DFA-based recognizers suffer from the excessive number of starting states, while the NFA-based ones suffer from the number of nondeterministic transitions. Our algorithm is based on a new type of multi-entry DFA called reduced interface DFA (RI-DFA), which can reduce speculation by cutting the number of initial states down to the size of an equivalent NFA. The algorithm benefits from the smaller size of an NFA combined with the speed of deterministic transitions. We have also developed a new initial-state reduction method for RI-DFA. A quantitative comparison of the number of starting states of RI-DFA vs DFA for a large public benchmark of complex FAs confirms the effectiveness of our approach. Speed measurements of text recognition on a multi-core computer show that RI-DFA matches the DFA- based recognizer on some benchmarks and is much faster on some others. The moderate time increase to convert NFA to RI-DFA does not hinder practical use.

Multi-entry DFA with reduced initial states to speedup parallel recognition

Angelo Borsotti;Luca Breveglieri;Stefano Crespi Reghizzi;Angelo Morzenti

2025-01-01

Abstract

Parallel algorithms for regular language recognition have been widely experimented for various types of finite automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such algorithms cut the input string into finitely many sections or chunks, independently recognize each one in parallel by means of identical FAs, and at last join the results to check overall consistency. Each FA, except the first, needs to speculatively start a run assuming any state as initial, though most runs will abort or be rejected when joining the computations of adjacent chunks. The ensuing overhead may nullify the gain over a serial algorithm. Existing parallel DFA-based recognizers suffer from the excessive number of starting states, while the NFA-based ones suffer from the number of nondeterministic transitions. Our algorithm is based on a new type of multi-entry DFA called reduced interface DFA (RI-DFA), which can reduce speculation by cutting the number of initial states down to the size of an equivalent NFA. The algorithm benefits from the smaller size of an NFA combined with the speed of deterministic transitions. We have also developed a new initial-state reduction method for RI-DFA. A quantitative comparison of the number of starting states of RI-DFA vs DFA for a large public benchmark of complex FAs confirms the effectiveness of our approach. Speed measurements of text recognition on a multi-core computer show that RI-DFA matches the DFA- based recognizer on some benchmarks and is much faster on some others. The moderate time increase to convert NFA to RI-DFA does not hinder practical use.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Implementation and Application of Automata
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				978-3-032-02601-9
978-3-032-02602-6
			
	Parole chiave
	
				regular language, parallel recognition, finite automata, multi-entry DFA, speculation overhead, multi-core computer architecture
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
main-CIAA25-final.pdf Accesso riservato : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 257.5 kB Formato Adobe PDF Visualizza/Apri	257.5 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1297506

Citazioni

ND

0

ND

social impact