Fast exploration of vast text corpora is typically heavily time-consuming. Topic modeling allows for discovering key concepts in massive text datasets without requiring prior knowledge of their content. We built TETYS, an end-to-end topic modeling pipeline, easily configurable for processing and visualizing datasets. We demonstrate its use when applied to five datasets encompassing research on Sustainability Development Goals, defining the world’s most pressing social, economic, and environmental challenges. TETYS is based on neural topic modeling and exploits LLMs to be proficient in many domains, including research publications that range from human sciences to engineering and technology. In this demo, participants will be able to interact with the dashboard to discover insights about the datasets and appreciate/test temporal trends in their research topics. Tool: http://gmql.eu/tetys. Video: https://tinyurl.com/tetys-video. Code: https://github.com/FrInve/TETYS.

TETYS: Configurable Topic Modeling Exploration for Big Corpora of Text Documents

Francesco Invernici;Anna Bernasconi;Francesca Curati;Jelena Jakimov;Amirhossein Samavi
2025-01-01

Abstract

Fast exploration of vast text corpora is typically heavily time-consuming. Topic modeling allows for discovering key concepts in massive text datasets without requiring prior knowledge of their content. We built TETYS, an end-to-end topic modeling pipeline, easily configurable for processing and visualizing datasets. We demonstrate its use when applied to five datasets encompassing research on Sustainability Development Goals, defining the world’s most pressing social, economic, and environmental challenges. TETYS is based on neural topic modeling and exploits LLMs to be proficient in many domains, including research publications that range from human sciences to engineering and technology. In this demo, participants will be able to interact with the dashboard to discover insights about the datasets and appreciate/test temporal trends in their research topics. Tool: http://gmql.eu/tetys. Video: https://tinyurl.com/tetys-video. Code: https://github.com/FrInve/TETYS.
2025
Proceedings of the 28th International Conference on Extending Database Technology (EDBT)
File in questo prodotto:
File Dimensione Formato  
paper-323.pdf

accesso aperto

: Publisher’s version
Dimensione 1.4 MB
Formato Adobe PDF
1.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285219
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact