We address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of multiple relations and output the K combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose CARS (Cost-Aware with Random and Sorted access), a pulling strategy is derived at compile-time that is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a few parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. CARS is also shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.
Cost-Aware Rank Join with Random and Sorted Access
MARTINENGHI, DAVIDE;TAGLIASACCHI, MARCO
2012-01-01
Abstract
We address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of multiple relations and output the K combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose CARS (Cost-Aware with Random and Sorted access), a pulling strategy is derived at compile-time that is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a few parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. CARS is also shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.File | Dimensione | Formato | |
---|---|---|---|
05963672.pdf
Accesso riservato
:
Pre-Print (o Pre-Refereeing)
Dimensione
391.12 kB
Formato
Adobe PDF
|
391.12 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.