This paper presents the analysis of quality regarding a textual Big Data Analytics approach developed within a Project dealing with a platform for Big Data shared among three companies. In particular, the paper focuses on documental Big Data. In the context of the Project, the work presented here deals with extraction of knowledge from document and process data in a Big Data environment, and focuses on the quality of processed data. Performance indexes, like correctness, precision, and efficiency parameters are used to evaluate the quality of the extraction and classification process. The novelty of the approach is that no document types are predefined but rather, after manual processing of new types, datasets are continuously set up as training sets to be processed by a Machine Learning step that learns the new documents types. The paper presents the document management architecture and discusses the main results.

Quality Evaluation for Documental Big Data Tools

M. Fugini;J. Finocchi
2020

Abstract

This paper presents the analysis of quality regarding a textual Big Data Analytics approach developed within a Project dealing with a platform for Big Data shared among three companies. In particular, the paper focuses on documental Big Data. In the context of the Project, the work presented here deals with extraction of knowledge from document and process data in a Big Data environment, and focuses on the quality of processed data. Performance indexes, like correctness, precision, and efficiency parameters are used to evaluate the quality of the extraction and classification process. The novelty of the approach is that no document types are predefined but rather, after manual processing of new types, datasets are continuously set up as training sets to be processed by a Machine Learning step that learns the new documents types. The paper presents the document management architecture and discusses the main results.
Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
978-989-758-423-7
Text Analytics, Big Data Analytics, Enterprise Content Management, Document Management, Machine Learning for Document Processing.
File in questo prodotto:
File Dimensione Formato  
ICEIS_2020_94.pdf

Accesso riservato

: Publisher’s version
Dimensione 646.09 kB
Formato Adobe PDF
646.09 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1141038
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact