RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.

Measuring Controversy in Social Networks Through NLP

de Zarate J. M. O.;Di Giovanni M.;Feuerstein E. Z.;Brambilla M.

2020-01-01

Abstract

Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours. In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter. Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy. To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world. The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del libro
	
				String Processing and Information Retrieval. SPIRE 2020.
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				978-3-030-59211-0
978-3-030-59212-7
			
	Parole chiave
	
				Controversy
NLP
Polarization
Social networks
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
10.1007@978-3-030-59212-7.pdf Accesso riservato Descrizione: Articolo principale : Publisher’s version Dimensione 2.47 MB Formato Adobe PDF Visualizza/Apri	2.47 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1169929

Citazioni

ND

11

3

social impact