RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.

A Benchmark Production Tool for Regular Expressions

Borsotti A.;Breveglieri L.;CrespiReghizzi S.;Morzenti A.

2019-01-01

Abstract

We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Titolo del libro
	
				Implementation and Application of Automata
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				978-3-030-23678-6
978-3-030-23679-3
			
	Parole chiave
	
				Benchmark for regular expressions; Regular expression generation; Regular expression tool
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
banchmark_production_tool_for_regexp.pdf Accesso riservato Descrizione: Articolo principale : Publisher’s version Dimensione 530.17 kB Formato Adobe PDF Visualizza/Apri	530.17 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1120729

Citazioni

ND

2

2

social impact