Data quality is a typical ethical requirement: we could never trust a piece of information if it did not have the typical data quality properties. Yet, we can also assert the opposite: that data should conform to a high ethical standard, for it to be considered of good quality. Hence, the satisfaction of the ethical requirements is actually necessary to assert the quality of a dataset, and in this paper we propose to introduce the most common ethical requirements as dimensions of quality, grouped in an Ethics Cluster. By now, we are more than aware that the Internet, and the worldwide extent of the usage of IT and computers, have generated a plethora of datasets in all kinds of application areas; this data can correspond to useful information only if it is of good quality, and let us emphasize that it can be profitable to society only if its usage conforms to ethical principles. With a somehow more constructive and dynamic viewpoint, in this paper we discuss the dimensions of ethics in connection with the various phases of what we call the information extraction process [20], that is, the process of (i) identifying the data sources containing the information of interest, (ii) collecting the corresponding data and integrating them in order to produce a unique dataset, and (iii) applying the appropriate information extraction methods (from the application of a simple query up to a complex statistical, machine learning or data mining analysis). We thus advocate the need to extend the well-established data quality framework in [5] to incorporate ethics explicitly.

Ethical dimensions for data quality

Tanca L.;Torlone R.
2020-01-01

Abstract

Data quality is a typical ethical requirement: we could never trust a piece of information if it did not have the typical data quality properties. Yet, we can also assert the opposite: that data should conform to a high ethical standard, for it to be considered of good quality. Hence, the satisfaction of the ethical requirements is actually necessary to assert the quality of a dataset, and in this paper we propose to introduce the most common ethical requirements as dimensions of quality, grouped in an Ethics Cluster. By now, we are more than aware that the Internet, and the worldwide extent of the usage of IT and computers, have generated a plethora of datasets in all kinds of application areas; this data can correspond to useful information only if it is of good quality, and let us emphasize that it can be profitable to society only if its usage conforms to ethical principles. With a somehow more constructive and dynamic viewpoint, in this paper we discuss the dimensions of ethics in connection with the various phases of what we call the information extraction process [20], that is, the process of (i) identifying the data sources containing the information of interest, (ii) collecting the corresponding data and integrating them in order to produce a unique dataset, and (iii) applying the appropriate information extraction methods (from the application of a simple query up to a complex statistical, machine learning or data mining analysis). We thus advocate the need to extend the well-established data quality framework in [5] to incorporate ethics explicitly.
2020
Data Science Pipeline
Data Quality
Ethics
Fairness
File in questo prodotto:
File Dimensione Formato  
Ethics_JDIQ20-1.pdf

accesso aperto

: Publisher’s version
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1213935
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 5
social impact