Users who search the web for specialized content typically lack knowledge of the precise topology of the dataset upon which the search is performed. Funded by European Union, TETYS is a beneficiary of the Next Generation Internet (NGI) Search Initiative; it proposes to build the next-generation open-source Web topic explorer. Our architecture inspects big textual corpora; it is composed of 1) a pipeline for ingesting huge data corpora, extracting highly relevant topics, clustered along orthogonal dimensions; and 2) an interactive dashboard, supporting topic visualization as word clouds and exploration of temporal series, with easy-to-drive statistical testing. The first prototype, CORToViz, explores the CORD-19 dataset (COVID-19 / SARS-CoV-2 virus research abstracts). Many different domains will be explored using TETYS (e.g., climate change and controversial debates on social media).

TETYS: Towards the Next-Generation Open-Source Web Topic Explorer

Anna Bernasconi;Francesco Invernici;Stefano Ceri
2024-01-01

Abstract

Users who search the web for specialized content typically lack knowledge of the precise topology of the dataset upon which the search is performed. Funded by European Union, TETYS is a beneficiary of the Next Generation Internet (NGI) Search Initiative; it proposes to build the next-generation open-source Web topic explorer. Our architecture inspects big textual corpora; it is composed of 1) a pipeline for ingesting huge data corpora, extracting highly relevant topics, clustered along orthogonal dimensions; and 2) an interactive dashboard, supporting topic visualization as word clouds and exploration of temporal series, with easy-to-drive statistical testing. The first prototype, CORToViz, explores the CORD-19 dataset (COVID-19 / SARS-CoV-2 virus research abstracts). Many different domains will be explored using TETYS (e.g., climate change and controversial debates on social media).
2024
Proceedings of the Research Projects Exhibition Papers Presented at the 36th International Conference on Advanced Information Systems Engineering (CAiSE 2024)
Big Data Analytics
Scientific Literature
Natural Language Processing
Topic Modeling
Time Series
File in questo prodotto:
File Dimensione Formato  
paper4.pdf

accesso aperto

: Publisher’s version
Dimensione 1.53 MB
Formato Adobe PDF
1.53 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1266724
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact