Finding the most relevant facts among dynamic and hetero- geneous data published on theWeb of Data is getting a growing attention in recent years. RDF Stream Processing (RSP) engines offer a baseline solution to integrate and process streaming data with data distributed on the Web. Unfortunately, the time to access and fetch the distributed data can be so high to put the RSP engine at risk of losing reactiveness, especially when the distributed data is slowly evolving. State of the art work addressed this problem by proposing an architectural solution that keeps a local replica of the distributed data and a baseline maintenance policy to refresh it over time. This doctoral thesis is investigating advance policies that let RSP engines continuously answer top-k queries, which require to join data streams with slowly evolving datasets published on the Web of Data, without violating the reactiveness constrains imposed by the users. In particular, it proposes policies that focus on freshing only the data in the replica that contributes to the correctness of the top-k results.
Retrieval of the most relevant facts from data streams joined with slowly evolving dataset published on the web of data
Zahmatkesh, Shima
2017-01-01
Abstract
Finding the most relevant facts among dynamic and hetero- geneous data published on theWeb of Data is getting a growing attention in recent years. RDF Stream Processing (RSP) engines offer a baseline solution to integrate and process streaming data with data distributed on the Web. Unfortunately, the time to access and fetch the distributed data can be so high to put the RSP engine at risk of losing reactiveness, especially when the distributed data is slowly evolving. State of the art work addressed this problem by proposing an architectural solution that keeps a local replica of the distributed data and a baseline maintenance policy to refresh it over time. This doctoral thesis is investigating advance policies that let RSP engines continuously answer top-k queries, which require to join data streams with slowly evolving datasets published on the Web of Data, without violating the reactiveness constrains imposed by the users. In particular, it proposes policies that focus on freshing only the data in the replica that contributes to the correctness of the top-k results.File | Dimensione | Formato | |
---|---|---|---|
paper_14.pdf
accesso aperto
:
Publisher’s version
Dimensione
222.05 kB
Formato
Adobe PDF
|
222.05 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.