The investigation of users’ behaviour on Web and Social Media platforms usually requires to analyze many heterogeneous features, such as shared textual content, social connections, demographic traits, and temporal attributes. This work aims to compute accurate user similarities on Twitter just using the textual content shared by users, a feature known to be easy and quick to collect. We design and train a 2-stages hierarchical Transformer-based model, whose first stage independently elaborates single tweets, and its second stage combines the embeddings of the tweets to obtain user-level representations. To evaluate our model we design a ranking task involving many accounts, automatically collected and labeled without the need for human annotators. We extensively investigate hyper-parameters to obtain the best model configuration. Finally, we check whether the obtained embeddings reflect our idea of similarity by testing them on further tasks, including community visualization, outlier detection, and polarization quantification.

Hierarchical Transformers for User Semantic Similarity

Di Giovanni, Marco;Brambilla, Marco
2023-01-01

Abstract

The investigation of users’ behaviour on Web and Social Media platforms usually requires to analyze many heterogeneous features, such as shared textual content, social connections, demographic traits, and temporal attributes. This work aims to compute accurate user similarities on Twitter just using the textual content shared by users, a feature known to be easy and quick to collect. We design and train a 2-stages hierarchical Transformer-based model, whose first stage independently elaborates single tweets, and its second stage combines the embeddings of the tweets to obtain user-level representations. To evaluate our model we design a ranking task involving many accounts, automatically collected and labeled without the need for human annotators. We extensively investigate hyper-parameters to obtain the best model configuration. Finally, we check whether the obtained embeddings reflect our idea of similarity by testing them on further tasks, including community visualization, outlier detection, and polarization quantification.
2023
Web Engineering. ICWE 2023
9783031344435
9783031344442
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1261619
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact