This chapter presents a model for knowledge extraction from documents written in natural language. The model relies on a clear distinction between a conceptual level, which models the domain knowledge, and a lexical level, which represents the domain vocabulary. An advanced stochastic model (which mixes, in a novel way, two well-known approaches) stores the mapping between such levels, taking in account the linguistic context of words. Such a stochastic model is then used to disambiguate documents’ words, during the indexing phase. The engine supports simple keyword-based queries, as well as natural language-based queries. The system is able to extend the domain knowledge, by means of a production-rules engine. The validation tests indicate that the system is able to extract concepts with good accuracy, even if the train set is small.
Knowledge Extraction from Natural Language Processing
SBATTELLA, LICIA;TEDESCO, ROBERTO
2012-01-01
Abstract
This chapter presents a model for knowledge extraction from documents written in natural language. The model relies on a clear distinction between a conceptual level, which models the domain knowledge, and a lexical level, which represents the domain vocabulary. An advanced stochastic model (which mixes, in a novel way, two well-known approaches) stores the mapping between such levels, taking in account the linguistic context of words. Such a stochastic model is then used to disambiguate documents’ words, during the indexing phase. The engine supports simple keyword-based queries, as well as natural language-based queries. The system is able to extend the domain knowledge, by means of a production-rules engine. The validation tests indicate that the system is able to extract concepts with good accuracy, even if the train set is small.File | Dimensione | Formato | |
---|---|---|---|
KnowledgeExtraction.pdf
Accesso riservato
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
1.26 MB
Formato
Adobe PDF
|
1.26 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.