In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speed- ups on abstract parallel machines using a data-parallel approach, practical devel- opments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PA- PAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally paral- lel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive ex- periments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.
PAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars
BARENGHI, ALESSANDRO;CRESPI REGHIZZI, STEFANO;MANDRIOLI, DINO;PRADELLA, MATTEO
2013-01-01
Abstract
In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speed- ups on abstract parallel machines using a data-parallel approach, practical devel- opments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PA- PAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally paral- lel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive ex- periments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.File | Dimensione | Formato | |
---|---|---|---|
main.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
219.03 kB
Formato
Adobe PDF
|
219.03 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.