The presence of accounts managed by cybersecurity experts, professionals, and organizations makes social media a valuable source for computer security awareness. By regularly capturing and analyzing the posts on emerging cyber threats, individuals and organizations can understand potential dangers in a timely manner and effectively implement mitigation strategies. However, retrieving relevant and informative posts from a social network is challenging due to the high percentage of posts containing uninformative content. This paper proposes a novel approach based on supervised classifiers for selecting relevant social media posts and categorizing them according to different types of vulnerabilities. To accomplish this task, we designed a pipeline combining text classifiers in cascade, training them on manually labelled data. We analyzed various neural network techniques, leveraging language-agnostic sentence-level embeddings and past user activity, validating these techniques in a cross-validation setup. With an achieved accuracy of 87%, our approach offers effective filtering and classification of social media posts, empowering cybersecurity professionals to stay informed and take appropriate measures.

Improving Cybersecurity Awareness: Tweet Classification using Multilingual Sentence Embeddings and Contextual Features

Cotov A.;Bono C.;Cappiello C.;Pernici B.
2023-01-01

Abstract

The presence of accounts managed by cybersecurity experts, professionals, and organizations makes social media a valuable source for computer security awareness. By regularly capturing and analyzing the posts on emerging cyber threats, individuals and organizations can understand potential dangers in a timely manner and effectively implement mitigation strategies. However, retrieving relevant and informative posts from a social network is challenging due to the high percentage of posts containing uninformative content. This paper proposes a novel approach based on supervised classifiers for selecting relevant social media posts and categorizing them according to different types of vulnerabilities. To accomplish this task, we designed a pipeline combining text classifiers in cascade, training them on manually labelled data. We analyzed various neural network techniques, leveraging language-agnostic sentence-level embeddings and past user activity, validating these techniques in a cross-validation setup. With an achieved accuracy of 87%, our approach offers effective filtering and classification of social media posts, empowering cybersecurity professionals to stay informed and take appropriate measures.
2023
Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023
machine learning
posts classification
security vulnerabilities
social media analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1261163
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact