An ever present, common sense idea in language modelling research is that, for a word to be a valid phrase, it should comply with multiple constraints at once. A new language definition model is studied, based on agreement or consensus between similar strings. Considering a regular set of strings over a bipartite alphabet made by pairs of unmarked/marked symbols, a match relation is introduced, in order to specify when such strings agree. Then a regular set over the bipartite alphabet can be interpreted as specifying another language over the unmarked alphabet, called the consensual language. A word is in the consensual languages if a set of corresponding atching strings is in the original language. The family thus defined includes the regular languages and also interesting non-semilinear ones. The word problem can be solved in NLOGSPACE, hence in P time. The emptiness problem is undecidable. Closure properties are proved for intersection with regular sets and inverse alphabetical homomorphism. Several conditions for a consensual definition to yield a regular language are presented, and it is shown that the size of a consensual specification of regular languages can be in a logarithmic ratio with respect to a DFA. The family is incomparable with context-free and tree-adjoining grammar families.

Consensual languages and matching finite-state computations

SAN PIETRO, PIERLUIGI;CRESPI REGHIZZI, STEFANO
2011-01-01

Abstract

An ever present, common sense idea in language modelling research is that, for a word to be a valid phrase, it should comply with multiple constraints at once. A new language definition model is studied, based on agreement or consensus between similar strings. Considering a regular set of strings over a bipartite alphabet made by pairs of unmarked/marked symbols, a match relation is introduced, in order to specify when such strings agree. Then a regular set over the bipartite alphabet can be interpreted as specifying another language over the unmarked alphabet, called the consensual language. A word is in the consensual languages if a set of corresponding atching strings is in the original language. The family thus defined includes the regular languages and also interesting non-semilinear ones. The word problem can be solved in NLOGSPACE, hence in P time. The emptiness problem is undecidable. Closure properties are proved for intersection with regular sets and inverse alphabetical homomorphism. Several conditions for a consensual definition to yield a regular language are presented, and it is shown that the size of a consensual specification of regular languages can be in a logarithmic ratio with respect to a DFA. The family is incomparable with context-free and tree-adjoining grammar families.
File in questo prodotto:
File Dimensione Formato  
crespi-sanpietro.pdf

Accesso riservato

: Pre-Print (o Pre-Refereeing)
Dimensione 195.06 kB
Formato Adobe PDF
195.06 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/636674
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact