< Back to previous page

Publication

Scalable multi-query optimization over federated scientific databases

Book - Dissertation

We will not focus on the actual integration problem. Instead we focus on how to efficiently evaluate distributed queries. One of the key characteristics of the databases discussed in Section 1.2 is that sources are frequently updated. Updates lead to interesting challenges even if one is interested in simple queries. Indeed, answers to queries may vary over time as more data becomes available. However, it is cumbersome to repeat all queries over time especially if they combine information from several sources. We therefore propose a monitoring approach. Users regist er their queries once and these queries are then executed periodically in batch mode. Users are then notified as soon as new answers to t heir queries arrive. As t hese queries are evaluated repeatedly, it is natural to look at multi query optimization (MQO) in this setting. An important characteristic of monitoring systems is that they typically support multiple users and therefore we must consider a large number of queries. We have chosen to focus on t he optimization of the communication cost, one of t he main bottlenecks in our setting with large amounts of distributed data. In the development of our systems we ensured t hat users need no special expertise in some query language to formulate their queries. Being non-experts in computer science, the scientists are faced with two major challenges: (i) How to express such distributed queries. Expressing distributed queries is a non-trivial task, even if we assume that scientists are familiar with query languages like SQL. Such queries can get arbitrarily complex as more sources are considered; (ii) How to efficiently evaluate such distributed queries. An efficient evaluation must account for batches of hundreds (or even t housands) of submitted queries and must optimize all of t hem as a whole.
Publication year:2009
Accessibility:Open