Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency - a type of data constraint - aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.

E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity

Azzalini F.;Criscuolo C.;Tanca L.
2022-01-01

Abstract

Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency - a type of data constraint - aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.
2022
data bias
data equity
data repair
diversity
fairness
Functional dependencies
File in questo prodotto:
File Dimensione Formato  
e-fair-db.pdf

Accesso riservato

: Publisher’s version
Dimensione 1.08 MB
Formato Adobe PDF
1.08 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1231833
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 1
social impact