The context-free grammars extended with regular expressions (RE), known as ECF or EBNF grammars, are commonly used as they often allow for terser language definitions. Yet for such grammars the notion of syntax tree, and consequently of ambiguity, lacks an agreed definition. The simplified tree structures returned by the existing parsing algorithms do not faithfully represent all the ways a sentence is derivable by means of the REs present in the grammar rules. We contribute a precise definition of the regular parts in the structures of the EBNF syntax trees, which is aligned with the tree representations that have been adopted by the recent algorithms for RE matching. For an EBNF rule, the finest representation shows all the RE operators as nodes in the sub-tree, while the flat representation simply appends the string generated by the RE to the nonterminal node. A consequent notion of representation-dependent ambiguity follows. The above representations are incorporated into a Tomita-style parsing algorithm, i.e., a GLR(1) parser. To construct such a parser, we follow the positional method of the Berry-Sethi parsers for regular languages. Given an EBNF grammar, our parser-generator produces a parser compliant with the choices expressed by the user about the representation of the syntax trees. We also report the parsing performance, over a few representative sets of languages (benchmarks) for programming and data representation, in comparison with existing parsers for EBNF grammars.
2022 Best Paper Award Runner Up - Journal of Computer Languages
Luca Breveglieri;Stefano Crespi Reghizzi;Angelo Morzenti
2023-01-01
Abstract
The context-free grammars extended with regular expressions (RE), known as ECF or EBNF grammars, are commonly used as they often allow for terser language definitions. Yet for such grammars the notion of syntax tree, and consequently of ambiguity, lacks an agreed definition. The simplified tree structures returned by the existing parsing algorithms do not faithfully represent all the ways a sentence is derivable by means of the REs present in the grammar rules. We contribute a precise definition of the regular parts in the structures of the EBNF syntax trees, which is aligned with the tree representations that have been adopted by the recent algorithms for RE matching. For an EBNF rule, the finest representation shows all the RE operators as nodes in the sub-tree, while the flat representation simply appends the string generated by the RE to the nonterminal node. A consequent notion of representation-dependent ambiguity follows. The above representations are incorporated into a Tomita-style parsing algorithm, i.e., a GLR(1) parser. To construct such a parser, we follow the positional method of the Berry-Sethi parsers for regular languages. Given an EBNF grammar, our parser-generator produces a parser compliant with the choices expressed by the user about the representation of the syntax trees. We also report the parsing performance, over a few representative sets of languages (benchmarks) for programming and data representation, in comparison with existing parsers for EBNF grammars.File | Dimensione | Formato | |
---|---|---|---|
COLA_Best Paper 2022 Runner Up Certificate_Borsotti et al.pdf
accesso aperto
:
Altro materiale allegato
Dimensione
281.36 kB
Formato
Adobe PDF
|
281.36 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.