On September 2020 a constitutional referendum was held in Italy. In this work we collect a dataset of 1:2M tweets related to this event, with particular interest to the textual content shared, and we design a hashtag-based semiautomatic approach to label them as Supporters or Against the referendum. We use the labelled dataset to train a classifier based on transformers, unsupervisedly pre-trained on Italian corpora. Our model generalizes well on tweets that cannot be labeled by the hashtagbased approach. We check that no length-, lexicon-A nd sentiment-biases are present to affect the performance of the classifier. Finally, we discuss the discrepancy between the magnitudes of tweets expressing a specific stance, obtained using both the hashtag-based approach and our trained classifier, and the real outcome of the referendum: The referendum was approved by 70% of the voters, while the number of tweets against the referendum is four times greater than the number of tweets supporting it. We conclude that the 2020 Italian constitutional referendum was an example of event where the minority was very loud on social media, highly influencing the perception of the event. Based on our findings, we suggest that drawing conclusion following only social media analysis should be performed carefully since it can lead to extremely wrong forecasts.

Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum

Di Giovanni, Marco;Brambilla, Marco
2021-01-01

Abstract

On September 2020 a constitutional referendum was held in Italy. In this work we collect a dataset of 1:2M tweets related to this event, with particular interest to the textual content shared, and we design a hashtag-based semiautomatic approach to label them as Supporters or Against the referendum. We use the labelled dataset to train a classifier based on transformers, unsupervisedly pre-trained on Italian corpora. Our model generalizes well on tweets that cannot be labeled by the hashtagbased approach. We check that no length-, lexicon-A nd sentiment-biases are present to affect the performance of the classifier. Finally, we discuss the discrepancy between the magnitudes of tweets expressing a specific stance, obtained using both the hashtag-based approach and our trained classifier, and the real outcome of the referendum: The referendum was approved by 70% of the voters, while the number of tweets against the referendum is four times greater than the number of tweets supporting it. We conclude that the 2020 Italian constitutional referendum was an example of event where the minority was very loud on social media, highly influencing the perception of the event. Based on our findings, we suggest that drawing conclusion following only social media analysis should be performed carefully since it can lead to extremely wrong forecasts.
2021
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
File in questo prodotto:
File Dimensione Formato  
2021.socialnlp-1.2.pdf

accesso aperto

: Publisher’s version
Dimensione 454.26 kB
Formato Adobe PDF
454.26 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1198719
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact