RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Both social media and sensing infrastructures are producing an unprecedented mass of data characterized by their uncertain nature, due to either the noise inherent in sensors or the imprecision of human contributions. Therefore query processing over uncertain data has become an active research field. In the well-known class of applications commonly referred to as top-K queries, the objective is to find the best K objects matching the user's information need, formulated as a scoring function over the objects' attribute values. If both the data and the scoring function are deterministic, the best K objects can be univocally determined and totally ordered so as to produce a single ranked result set (as long as ties are broken by some deterministic rule). However, in application scenarios involving uncertain data and fuzzy information needs, this does not hold: when either the attribute values or the scoring function are nondeterministic, there may be no consensus on a single ordering, but rather a space of possible orderings. To determine the correct ordering, one needs to acquire additional information so as to reduce the amount of uncertainty associated with the queried data and consequently the number of orderings in such a space. An emerging trend in data processing is crowdsourcing, defined as the systematic engagement of humans in the resolution of tasks through online distributed work. Our approach combines human and automatic computation in order to solve complex problems: when data ambiguity can be resolved by human judgment, crowdsourcing becomes a viable tool for converging towards a unique or at least less uncertain query result. The goal of this paper is to define and compare task selection policies for uncertainty reduction via crowdsourcing, with emphasis on the case of top-K queries.

Crowdsourcing for top-K query processing over uncertain data

CICERI, ELEONORA;FRATERNALI, PIERO;MARTINENGHI, DAVIDE;TAGLIASACCHI, MARCO

2016-01-01

Abstract

Both social media and sensing infrastructures are producing an unprecedented mass of data characterized by their uncertain nature, due to either the noise inherent in sensors or the imprecision of human contributions. Therefore query processing over uncertain data has become an active research field. In the well-known class of applications commonly referred to as top-K queries, the objective is to find the best K objects matching the user's information need, formulated as a scoring function over the objects' attribute values. If both the data and the scoring function are deterministic, the best K objects can be univocally determined and totally ordered so as to produce a single ranked result set (as long as ties are broken by some deterministic rule). However, in application scenarios involving uncertain data and fuzzy information needs, this does not hold: when either the attribute values or the scoring function are nondeterministic, there may be no consensus on a single ordering, but rather a space of possible orderings. To determine the correct ordering, one needs to acquire additional information so as to reduce the amount of uncertainty associated with the queried data and consequently the number of orderings in such a space. An emerging trend in data processing is crowdsourcing, defined as the systematic engagement of humans in the resolution of tasks through online distributed work. Our approach combines human and automatic computation in order to solve complex problems: when data ambiguity can be resolved by human judgment, crowdsourcing becomes a viable tool for converging towards a unique or at least less uncertain query result. The goal of this paper is to define and compare task selection policies for uncertainty reduction via crowdsourcing, with emphasis on the case of top-K queries.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2016
		
	ISBN (International Standard Book Number)
	
			9781509020195
9781509020195
		
	Parole chiave
	
			Artificial Intelligence; Computational Theory and Mathematics; Computer Graphics and Computer-Aided Design; Computer Networks and Communications; Information Systems; Information Systems and Management
		
	Appare nelle tipologie:
	
			04.3 Poster

File in questo prodotto:

File	Dimensione	Formato
ICDE2016-CiceriFraternaliMartinenghiTagliasacchi.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 204.7 kB Formato Adobe PDF Visualizza/Apri	204.7 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1004358

Citazioni

ND

5

0

social impact