The presence of accounts managed by cybersecurity experts, professionals, and organizations makes social media a valuable source for computer security awareness. By regularly capturing and analyzing the posts on emerging cyber threats, individuals and organizations can understand potential dangers in a timely manner and effectively implement mitigation strategies. However, retrieving relevant and informative posts from a social network is challenging due to the high percentage of posts containing uninformative content. This paper proposes a novel approach based on supervised classifiers for selecting relevant social media posts and categorizing them according to different types of vulnerabilities. To accomplish this task, we designed a pipeline combining text classifiers in cascade, training them on manually labelled data. We analyzed various neural network techniques, leveraging language-agnostic sentence-level embeddings and past user activity, validating these techniques in a cross-validation setup. With an achieved accuracy of 87%, our approach offers effective filtering and classification of social media posts, empowering cybersecurity professionals to stay informed and take appropriate measures.
Improving Cybersecurity Awareness: Tweet Classification using Multilingual Sentence Embeddings and Contextual Features
Cotov A.;Bono C.;Cappiello C.;Pernici B.
2023-01-01
Abstract
The presence of accounts managed by cybersecurity experts, professionals, and organizations makes social media a valuable source for computer security awareness. By regularly capturing and analyzing the posts on emerging cyber threats, individuals and organizations can understand potential dangers in a timely manner and effectively implement mitigation strategies. However, retrieving relevant and informative posts from a social network is challenging due to the high percentage of posts containing uninformative content. This paper proposes a novel approach based on supervised classifiers for selecting relevant social media posts and categorizing them according to different types of vulnerabilities. To accomplish this task, we designed a pipeline combining text classifiers in cascade, training them on manually labelled data. We analyzed various neural network techniques, leveraging language-agnostic sentence-level embeddings and past user activity, validating these techniques in a cross-validation setup. With an achieved accuracy of 87%, our approach offers effective filtering and classification of social media posts, empowering cybersecurity professionals to stay informed and take appropriate measures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.