The Tomita’s Generalized LR(1) parsing algorithm (GLR), later improved in many ways, runs in a linear time on LR(1) grammars and degrades to a polynomial-time bound if the grammar is not deterministic. We address a useful feature not present in the current GLR(1) methods: the ability to accept grammars of the Extended BNF type (EBNF), the rules of which contain regular expressions. An EBNF grammar is conveniently represented by a collection of finite automata called a Transition Net (TN). We define, analyze and evaluate a new GLR(1) algorithm, called GELR, that combines the recent LR(1) parsing algorithm for TNs with the classical GLR data structures: the Graph-Structured Stack representing multiple stacks, and the Shared Packed Parse Forest for multiple syntax trees. The GELR algorithm is proved correct and an efficient implementation incorporating the state-of-the-art Right-Nulled parsing optimization is available. Experimental measures of the GELR parser size, speed and memory footprint are reported for current programming and web languages, and are compared with those of other parsing algorithms. The findings prove that directly parsing EBNF grammars does not penalize speed. Performance comparisons for different computer languages should also be of interest.

Fast GLR Parsers for Extended BNF Grammars and Transition Networks

LUCA BREVEGLIERI;STEFANO CRESPI REGHIZZI;ANGELO MORZENTI
2021-01-01

Abstract

The Tomita’s Generalized LR(1) parsing algorithm (GLR), later improved in many ways, runs in a linear time on LR(1) grammars and degrades to a polynomial-time bound if the grammar is not deterministic. We address a useful feature not present in the current GLR(1) methods: the ability to accept grammars of the Extended BNF type (EBNF), the rules of which contain regular expressions. An EBNF grammar is conveniently represented by a collection of finite automata called a Transition Net (TN). We define, analyze and evaluate a new GLR(1) algorithm, called GELR, that combines the recent LR(1) parsing algorithm for TNs with the classical GLR data structures: the Graph-Structured Stack representing multiple stacks, and the Shared Packed Parse Forest for multiple syntax trees. The GELR algorithm is proved correct and an efficient implementation incorporating the state-of-the-art Right-Nulled parsing optimization is available. Experimental measures of the GELR parser size, speed and memory footprint are reported for current programming and web languages, and are compared with those of other parsing algorithms. The findings prove that directly parsing EBNF grammars does not penalize speed. Performance comparisons for different computer languages should also be of interest.
2021
GLR parsing
regular right-hand side grammar
parser performance comparison
graph-structured stack
GSS
shared packed parse forest
SPPF
infinite ambiguity
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S2590118421000149-main (1).pdf

Accesso riservato

: Publisher’s version
Dimensione 1.33 MB
Formato Adobe PDF
1.33 MB Adobe PDF   Visualizza/Apri
main - final.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 398.42 kB
Formato Adobe PDF
398.42 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208976
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact