Users who search the web for specialized content typically lack knowledge of the precise topology of the dataset upon which the search is performed. Funded by European Union, TETYS is a beneficiary of the Next Generation Internet (NGI) Search Initiative; it proposes to build the next-generation open-source Web topic explorer. Our architecture inspects big textual corpora; it is composed of 1) a pipeline for ingesting huge data corpora, extracting highly relevant topics, clustered along orthogonal dimensions; and 2) an interactive dashboard, supporting topic visualization as word clouds and exploration of temporal series, with easy-to-drive statistical testing. The first prototype, CORToViz, explores the CORD-19 dataset (COVID-19 / SARS-CoV-2 virus research abstracts). Many different domains will be explored using TETYS (e.g., climate change and controversial debates on social media).
TETYS: Towards the Next-Generation Open-Source Web Topic Explorer
Anna Bernasconi;Francesco Invernici;Stefano Ceri
2024-01-01
Abstract
Users who search the web for specialized content typically lack knowledge of the precise topology of the dataset upon which the search is performed. Funded by European Union, TETYS is a beneficiary of the Next Generation Internet (NGI) Search Initiative; it proposes to build the next-generation open-source Web topic explorer. Our architecture inspects big textual corpora; it is composed of 1) a pipeline for ingesting huge data corpora, extracting highly relevant topics, clustered along orthogonal dimensions; and 2) an interactive dashboard, supporting topic visualization as word clouds and exploration of temporal series, with easy-to-drive statistical testing. The first prototype, CORToViz, explores the CORD-19 dataset (COVID-19 / SARS-CoV-2 virus research abstracts). Many different domains will be explored using TETYS (e.g., climate change and controversial debates on social media).File | Dimensione | Formato | |
---|---|---|---|
paper4.pdf
accesso aperto
:
Publisher’s version
Dimensione
1.53 MB
Formato
Adobe PDF
|
1.53 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.