The problem of aggregating scores, so as to provide a ranking of objects in a dataset according to different evaluation criteria, is central to many modern data-intensive applications. Although efficient (instance optimal) algorithms exist to this purpose (such as the Threshold Algorithm TA and its variants) none of them is able to deal with scenarios in which the function used to aggregate scores is only partially specified. This is the typical case when the function is a weighted sum, and the user is unable to provide precise values for the weights. In this paper, we consider the problem of processing multi-source top-k queries, when only constraints, rather than precise values, are available for the weights. After observing that the so-called Fagin's Algorithm (FA) can be adapted to solve the problem, yet only when no constraints at all are present (a case in which our queries will return the k-skyband of the dataset), we introduce the novel FSA algorithm, which we prove to be instance optimal for any set of constraints on the weights. We also propose several optimizations to the basic FSA logic so as to improve execution times. Experimental analysis on both real and synthetic datasets shows that our optimizations are indeed highly effective and that the increased flexibility provided by FSA introduces little overhead with respect to the case of classical top-k queries.

FA + TA < FSA: Flexible score aggregation

Martinenghi, Davide
2018

Abstract

The problem of aggregating scores, so as to provide a ranking of objects in a dataset according to different evaluation criteria, is central to many modern data-intensive applications. Although efficient (instance optimal) algorithms exist to this purpose (such as the Threshold Algorithm TA and its variants) none of them is able to deal with scenarios in which the function used to aggregate scores is only partially specified. This is the typical case when the function is a weighted sum, and the user is unable to provide precise values for the weights. In this paper, we consider the problem of processing multi-source top-k queries, when only constraints, rather than precise values, are available for the weights. After observing that the so-called Fagin's Algorithm (FA) can be adapted to solve the problem, yet only when no constraints at all are present (a case in which our queries will return the k-skyband of the dataset), we introduce the novel FSA algorithm, which we prove to be instance optimal for any set of constraints on the weights. We also propose several optimizations to the basic FSA logic so as to improve execution times. Experimental analysis on both real and synthetic datasets shows that our optimizations are indeed highly effective and that the increased flexibility provided by FSA introduces little overhead with respect to the case of classical top-k queries.
International Conference on Information and Knowledge Management, Proceedings
9781450360142
Business, Management and Accounting (all); Decision Sciences (all)
File in questo prodotto:
File Dimensione Formato  
CIKM2018-CiacciaMartinenghi.pdf

Accesso riservato

: Publisher’s version
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/1082852
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact