Database challenges for exploratory computing

Buoncristiano, Marcello; Mecca, Giansalvatore; Quintarelli, Elisa; Roveri, Manuel; Santoro, Donatello; Tanca, Letizia

doi:10.1145/2814710.2814714

Helping users to make sense of very big datasets is nowadays considered an important research topic. However, the tools that are available for data analysis purposes typically address professional data scientists, who, besides a deep knowledge of the domain of interest, master one or more of the following disciplines: mathematics, statistics, computer science, computer engineering, and programming. On the contrary, in our vision it is vital to support also different kinds of users who, for various reasons, may want to analyze the data and obtain new insight from them. Examples of these data enthusiasts [4, 9] are journalists, investors, or politicians: non-technical users who can draw great advantage from exploring the data, achieving new and essential knowledge, instead of reading query results with tons of records. The term data exploration generally refers to a data user being able to find her way through large amounts of data in order to gather the necessary information. A more technical definition comes from the field of statistics, introduced by Tukey [12]: with exploratory data analysis the researcher explores the data in many possible ways, including the use of graphical tools like boxplots or histograms, gaining knowledge from the way data are displayed. Despite the emphasis on visualization, exploratory data analysis still assumes that the user understands at least the basics of statistics, while in this paper we propose a paradigm for database exploration which is in turn inspired by the exploratory computing vision [2]. We may describe exploratory computing as the step-by-step “conversation” of a user and a system that “help each other” to refine the data exploration process, ultimately gathering new knowledge that concretely fullfils the user needs. The process is seen as a conversation since the system provides active support: it not only answers user’s requests, but also suggests one or more possible actions that may help the user to focus the exploratory session. This activity may entail the use of a wide range of different techniques, including the use of statistics and data analysis, query suggestion, advanced visualization tools, etc. The closest analogy [2] is that of a human-tohuman dialogue, in which two people talk, and continuously make reference to their lives, priorities, knowledge and beliefs, leveraging them in order to provide the best possible contribution to the dialogue. In essence, through the conversation they are exploring themselves as well as the information that is conveyed through their words. This exploration process therefore means investigation, exploration-seeking, comparison-making, and learning altogether. It is most appropriate for big collections of semantically rich data, which typically hide precious knowledge behind their complexity. In this broad and innovative context, this paper intends to make a significant step further: it proposes a model to concretely perform this kind of exploration over a database. The model is general enough to encompass most data models and query languages that have been proposed for data management in the last few years. At the same time, it is precise enough to provide a first formalization of the problem and reason about the research challenges posed to database researchers by this new paradigm of interaction.

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano