We address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of multiple relations and output the K combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose CARS (Cost-Aware with Random and Sorted access), a pulling strategy is derived at compile-time that is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a few parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. CARS is also shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.

Cost-Aware Rank Join with Random and Sorted Access

MARTINENGHI, DAVIDE;TAGLIASACCHI, MARCO
2012

Abstract

We address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of multiple relations and output the K combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose CARS (Cost-Aware with Random and Sorted access), a pulling strategy is derived at compile-time that is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a few parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. CARS is also shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.
File in questo prodotto:
File Dimensione Formato  
05963672.pdf

Accesso riservato

: Pre-Print (o Pre-Refereeing)
Dimensione 391.12 kB
Formato Adobe PDF
391.12 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/666317
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 4
social impact