In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speed- ups on abstract parallel machines using a data-parallel approach, practical devel- opments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PA- PAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally paral- lel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive ex- periments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.

PAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars

BARENGHI, ALESSANDRO;CRESPI REGHIZZI, STEFANO;MANDRIOLI, DINO;PRADELLA, MATTEO
2013-01-01

Abstract

In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speed- ups on abstract parallel machines using a data-parallel approach, practical devel- opments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PA- PAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally paral- lel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive ex- periments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.
2013
Software Language Engineering (SLE 2012)
9783642360886
9783642360893
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 219.03 kB
Formato Adobe PDF
219.03 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/709734
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact