Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.

Measuring Controversy in Social Networks Through NLP

Di Giovanni M.;Brambilla M.
2020-01-01

Abstract

Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.
2020
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
978-3-030-59211-0
978-3-030-59212-7
Controversy
NLP
Polarization
Social networks
File in questo prodotto:
File Dimensione Formato  
10.1007@978-3-030-59212-7.pdf

Accesso riservato

Descrizione: Articolo principale
: Publisher’s version
Dimensione 2.47 MB
Formato Adobe PDF
2.47 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1169929
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact