Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum

Di Giovanni, Marco; Brambilla, Marco

doi:10.18653/v1/2021.socialnlp-1.2

On September 2020 a constitutional referendum was held in Italy. In this work we collect a dataset of 1:2M tweets related to this event, with particular interest to the textual content shared, and we design a hashtag-based semiautomatic approach to label them as Supporters or Against the referendum. We use the labelled dataset to train a classifier based on transformers, unsupervisedly pre-trained on Italian corpora. Our model generalizes well on tweets that cannot be labeled by the hashtagbased approach. We check that no length-, lexicon-A nd sentiment-biases are present to affect the performance of the classifier. Finally, we discuss the discrepancy between the magnitudes of tweets expressing a specific stance, obtained using both the hashtag-based approach and our trained classifier, and the real outcome of the referendum: The referendum was approved by 70% of the voters, while the number of tweets against the referendum is four times greater than the number of tweets supporting it. We conclude that the 2020 Italian constitutional referendum was an example of event where the minority was very loud on social media, highly influencing the perception of the event. Based on our findings, we suggest that drawing conclusion following only social media analysis should be performed carefully since it can lead to extremely wrong forecasts.