Language models are used for a variety of downstream applications, such as improving web search results or parsing CVs to identify the best candidate for a job position. At the same time, concern is growing around word and sentence embeddings, popular language models that have been shown to exhibit large amount of social bias. In this work, by leveraging the possibility to further train state-of-the-art pre-trained embedding models, we propose to mitigate gender bias by fine-tuning sentence encoders on a semantic similarity task built around gender stereotype sentences and corresponding gender-swapped anti-stereotypes, in order to enforce similarity between the two categories. We test our intuition on two popular language models, BERT-Base and DistilBERT, and measure the amount of gender bias mitigation using the Sentence Encoder Association Test (SEAT). Our solution shows promising results despite using a small amount of training data, proving that post-processing bias mitigation techniques based on fine-tuning can effectively reduce gender bias in sentence encoders.
Fine-Tuning Language Models to Mitigate Gender Bias in Sentence Encoders
Dolci T.
2022-01-01
Abstract
Language models are used for a variety of downstream applications, such as improving web search results or parsing CVs to identify the best candidate for a job position. At the same time, concern is growing around word and sentence embeddings, popular language models that have been shown to exhibit large amount of social bias. In this work, by leveraging the possibility to further train state-of-the-art pre-trained embedding models, we propose to mitigate gender bias by fine-tuning sentence encoders on a semantic similarity task built around gender stereotype sentences and corresponding gender-swapped anti-stereotypes, in order to enforce similarity between the two categories. We test our intuition on two popular language models, BERT-Base and DistilBERT, and measure the amount of gender bias mitigation using the Sentence Encoder Association Test (SEAT). Our solution shows promising results despite using a small amount of training data, proving that post-processing bias mitigation techniques based on fine-tuning can effectively reduce gender bias in sentence encoders.File | Dimensione | Formato | |
---|---|---|---|
2022-Fine-Tuning_Language_Models_to_Mitigate_Gender_Bias_in_Sentence_Encoders.pdf
Accesso riservato
:
Publisher’s version
Dimensione
151.41 kB
Formato
Adobe PDF
|
151.41 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.