The world is facing a multitude of challenges that hinder the development of human civilization and the well-being of humanity on the planet. The Sustainable Development Goals (SDGs) were formulated by the United Nations in 2015 to address these global challenges by 2030. Natural language processing techniques can help uncover discussions on SDGs within research literature. We propose a completely automated pipeline that (1) fetches content from academic literature and prepares datasets dedicated to five groups of SDGs; (2) performs topic modeling, a statistical technique used to identify topics in large collections of textual data; and (3) enables topic exploration through keywords-based search and topic frequency time series extraction. For topic modeling, we leverage the stack of BERTopic scaled up to be applied on large corpora of textual documents (we find hundreds of topics on hundreds of thousands of documents), introducing (i) a novel LLM-based embeddings computation for representing scientific abstracts in the continuous space, and (ii) a hyperparameter optimizer to efficiently find the best configuration for any new dataset. We additionally produce the visualization of results on interactive dashboards reporting topics’ temporal evolution. Results are made inspectable and explorable, contributing to the interpretability of the topic modeling process. The proposed LLM-based topic modeling pipeline allows users to capture insights on the evolution of the attitude toward SDGs within scientific abstracts in the 2006–2023 time span. All the results are reproducible by using our system; the workflow can be generalized to be applied at any point in time to any large corpus of text data.

Capturing research literature attitude towards sustainable development goals: an LLM-based topic modeling approach

Invernici, Francesco;Curati, Francesca;Jakimov, Jelena;Samavi, Amirhossein;Bernasconi, Anna
2025-01-01

Abstract

The world is facing a multitude of challenges that hinder the development of human civilization and the well-being of humanity on the planet. The Sustainable Development Goals (SDGs) were formulated by the United Nations in 2015 to address these global challenges by 2030. Natural language processing techniques can help uncover discussions on SDGs within research literature. We propose a completely automated pipeline that (1) fetches content from academic literature and prepares datasets dedicated to five groups of SDGs; (2) performs topic modeling, a statistical technique used to identify topics in large collections of textual data; and (3) enables topic exploration through keywords-based search and topic frequency time series extraction. For topic modeling, we leverage the stack of BERTopic scaled up to be applied on large corpora of textual documents (we find hundreds of topics on hundreds of thousands of documents), introducing (i) a novel LLM-based embeddings computation for representing scientific abstracts in the continuous space, and (ii) a hyperparameter optimizer to efficiently find the best configuration for any new dataset. We additionally produce the visualization of results on interactive dashboards reporting topics’ temporal evolution. Results are made inspectable and explorable, contributing to the interpretability of the topic modeling process. The proposed LLM-based topic modeling pipeline allows users to capture insights on the evolution of the attitude toward SDGs within scientific abstracts in the 2006–2023 time span. All the results are reproducible by using our system; the workflow can be generalized to be applied at any point in time to any large corpus of text data.
2025
Topic modeling
Text embeddings
LLM
Sustainable development goals
Textual data analysis
Temporal trends
File in questo prodotto:
File Dimensione Formato  
s40537-025-01189-4.pdf

accesso aperto

: Publisher’s version
Dimensione 2.54 MB
Formato Adobe PDF
2.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1291627
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact