Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks. In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE. Works about adversarial attacks have been published to test their generalization proprieties and robustness. In this study, we propose Evolutionary Fooling Sentences Generator (EFSG), a black-box task-agnostic adversarial attack algorithm designed in an evolutionary fashion to generate false-positive sentences for binary classification tasks. We successfully apply EFSG to single-sentence (CoLA) and sentence-pair (MRPC) classification tasks, on BERT and RoBERTa. Results prove the presence of weak spots in state-of-the-art LMs. To complete the analysis, we perform transferability tests and ablation study. Finally, adversarial training helps as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy.

EFSG: Evolutionary Fooling Sentences Generator

Di Giovanni M.;Brambilla M.
2021-01-01

Abstract

Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks. In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE. Works about adversarial attacks have been published to test their generalization proprieties and robustness. In this study, we propose Evolutionary Fooling Sentences Generator (EFSG), a black-box task-agnostic adversarial attack algorithm designed in an evolutionary fashion to generate false-positive sentences for binary classification tasks. We successfully apply EFSG to single-sentence (CoLA) and sentence-pair (MRPC) classification tasks, on BERT and RoBERTa. Results prove the presence of weak spots in state-of-the-art LMs. To complete the analysis, we perform transferability tests and ablation study. Finally, adversarial training helps as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy.
2021
Proceedings - 2021 IEEE 15th International Conference on Semantic Computing, ICSC 2021
978-1-7281-8899-7
File in questo prodotto:
File Dimensione Formato  
09364637.pdf

Accesso riservato

Descrizione: Articolo principale
: Publisher’s version
Dimensione 448.87 kB
Formato Adobe PDF
448.87 kB Adobe PDF   Visualizza/Apri
11311-1169928_Di Giovanni.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 459.69 kB
Formato Adobe PDF
459.69 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1169928
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact