RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The Tomita’s Generalized LR(1) parsing algorithm (GLR), later improved in many ways, runs in a linear time on LR(1) grammars and degrades to a polynomial-time bound if the grammar is not deterministic. We address a useful feature not present in the current GLR(1) methods: the ability to accept grammars of the Extended BNF type (EBNF), the rules of which contain regular expressions. An EBNF grammar is conveniently represented by a collection of finite automata called a Transition Net (TN). We define, analyze and evaluate a new GLR(1) algorithm, called GELR, that combines the recent LR(1) parsing algorithm for TNs with the classical GLR data structures: the Graph-Structured Stack representing multiple stacks, and the Shared Packed Parse Forest for multiple syntax trees. The GELR algorithm is proved correct and an efficient implementation incorporating the state-of-the-art Right-Nulled parsing optimization is available. Experimental measures of the GELR parser size, speed and memory footprint are reported for current programming and web languages, and are compared with those of other parsing algorithms. The findings prove that directly parsing EBNF grammars does not penalize speed. Performance comparisons for different computer languages should also be of interest.

Fast GLR Parsers for Extended BNF Grammars and Transition Networks

ANGELO BORSOTTI;LUCA BREVEGLIERI;STEFANO CRESPI REGHIZZI;ANGELO MORZENTI

2021-01-01

Abstract

The Tomita’s Generalized LR(1) parsing algorithm (GLR), later improved in many ways, runs in a linear time on LR(1) grammars and degrades to a polynomial-time bound if the grammar is not deterministic. We address a useful feature not present in the current GLR(1) methods: the ability to accept grammars of the Extended BNF type (EBNF), the rules of which contain regular expressions. An EBNF grammar is conveniently represented by a collection of finite automata called a Transition Net (TN). We define, analyze and evaluate a new GLR(1) algorithm, called GELR, that combines the recent LR(1) parsing algorithm for TNs with the classical GLR data structures: the Graph-Structured Stack representing multiple stacks, and the Shared Packed Parse Forest for multiple syntax trees. The GELR algorithm is proved correct and an efficient implementation incorporating the state-of-the-art Right-Nulled parsing optimization is available. Experimental measures of the GELR parser size, speed and memory footprint are reported for current programming and web languages, and are compared with those of other parsing algorithms. The findings prove that directly parsing EBNF grammars does not penalize speed. Performance comparisons for different computer languages should also be of interest.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo della rivista
	
				JOURNAL OF COMPUTER LANGUAGES
			
	Parole chiave
	
				GLR parsing
regular right-hand side grammar
parser performance comparison
graph-structured stack
GSS
shared packed parse forest
SPPF
infinite ambiguity
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S2590118421000149-main (1).pdf Accesso riservato : Publisher’s version Dimensione 1.33 MB Formato Adobe PDF Visualizza/Apri	1.33 MB	Adobe PDF	Visualizza/Apri
main - final.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 398.42 kB Formato Adobe PDF Visualizza/Apri	398.42 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208976

Citazioni

ND

7

6

social impact