RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Data quality is a typical ethical requirement: we could never trust a piece of information if it did not have the typical data quality properties. Yet, we can also assert the opposite: that data should conform to a high ethical standard, for it to be considered of good quality. Hence, the satisfaction of the ethical requirements is actually necessary to assert the quality of a dataset, and in this paper we propose to introduce the most common ethical requirements as dimensions of quality, grouped in an Ethics Cluster. By now, we are more than aware that the Internet, and the worldwide extent of the usage of IT and computers, have generated a plethora of datasets in all kinds of application areas; this data can correspond to useful information only if it is of good quality, and let us emphasize that it can be profitable to society only if its usage conforms to ethical principles. With a somehow more constructive and dynamic viewpoint, in this paper we discuss the dimensions of ethics in connection with the various phases of what we call the information extraction process [20], that is, the process of (i) identifying the data sources containing the information of interest, (ii) collecting the corresponding data and integrating them in order to produce a unique dataset, and (iii) applying the appropriate information extraction methods (from the application of a simple query up to a complex statistical, machine learning or data mining analysis). We thus advocate the need to extend the well-established data quality framework in [5] to incorporate ethics explicitly.

Ethical dimensions for data quality

Firmani D.;Tanca L.;Torlone R.

2020-01-01

Abstract

Data quality is a typical ethical requirement: we could never trust a piece of information if it did not have the typical data quality properties. Yet, we can also assert the opposite: that data should conform to a high ethical standard, for it to be considered of good quality. Hence, the satisfaction of the ethical requirements is actually necessary to assert the quality of a dataset, and in this paper we propose to introduce the most common ethical requirements as dimensions of quality, grouped in an Ethics Cluster. By now, we are more than aware that the Internet, and the worldwide extent of the usage of IT and computers, have generated a plethora of datasets in all kinds of application areas; this data can correspond to useful information only if it is of good quality, and let us emphasize that it can be profitable to society only if its usage conforms to ethical principles. With a somehow more constructive and dynamic viewpoint, in this paper we discuss the dimensions of ethics in connection with the various phases of what we call the information extraction process [20], that is, the process of (i) identifying the data sources containing the information of interest, (ii) collecting the corresponding data and integrating them in order to produce a unique dataset, and (iii) applying the appropriate information extraction methods (from the application of a simple query up to a complex statistical, machine learning or data mining analysis). We thus advocate the need to extend the well-established data quality framework in [5] to incorporate ethics explicitly.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo della rivista
	
				ACM JOURNAL OF DATA AND INFORMATION QUALITY
			
	Parole chiave
	
				Data Science Pipeline
			
	Parole chiave
	
				Data Quality
Ethics
Fairness
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
Ethics_JDIQ20-1.pdf accesso aperto : Publisher’s version Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri	1.13 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1213935

Citazioni

ND

25

5

social impact