RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. This approach is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). As we employ a multi-layer representation of Twitter diffusion networks where each layer describes one single type of interaction (tweet, retweet, mention, etc.), we quantify the advantage of separating the layers with respect to an aggregated approach and assess the impact of each layer on the classification. Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy (AUROC up to 94%). We also highlight differences in the sharing patterns of the two news domains which appear to be common in the two countries. We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.

A multi-layer approach to disinformation detection in US and Italian news spreading on Twitter

Pierri F.;Piccardi C.;Ceri S.

2020-01-01

Abstract

We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. This approach is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). As we employ a multi-layer representation of Twitter diffusion networks where each layer describes one single type of interaction (tweet, retweet, mention, etc.), we quantify the advantage of separating the layers with respect to an aggregated approach and assess the impact of each layer on the classification. Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy (AUROC up to 94%). We also highlight differences in the sharing patterns of the two news domains which appear to be common in the two countries. We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo della rivista
	
				EPJ DATA SCIENCE
			
	Parole chiave
	
				Computational social science
Disinformation
Multi-layer networks
Twitter
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
epjds1.pdf accesso aperto Descrizione: Articolo : Publisher’s version Dimensione 1.81 MB Formato Adobe PDF Visualizza/Apri	1.81 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1154006

Citazioni

ND

31

22

social impact