Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.
Measuring Controversy in Social Networks Through NLP
Di Giovanni M.;Brambilla M.
2020-01-01
Abstract
Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.File | Dimensione | Formato | |
---|---|---|---|
10.1007@978-3-030-59212-7.pdf
Accesso riservato
Descrizione: Articolo principale
:
Publisher’s version
Dimensione
2.47 MB
Formato
Adobe PDF
|
2.47 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.