In the context of the Semantic Web, a large number of hug RDF linked datasets have become available, and this number keeps growing. Simultaneously, scalable RDF engines that follow the traditional optimize-then-execute paradigm have been developed to locally access RDF data, and SPARQL endpoints have been implemented for remote query processing. Although queries against locally stored data can be efficiently executed, remote query executions may frequently be unsuccessful. First, the most efficient RDF engines rely their query processing algorithms on physical access and storage structures that are locally stored; however, because of the size of existing linked datasets, loading the data and their links is not always feasible. Second, remote linked data query processing can be extremely costly because of the lack of query planning; also, current techniques are not adaptable to unpredictable data transfers or data availability, thus, executions can be unsuccessful. In this talk, I will describe both optimize-then-execute techniques and adaptive query processing strategies that have been developed to access RDF data; linked RDF datasets will be used to illustrate the performance of the proposed approaches. First, query optimization and execution techniques to access locally stored RDF data will be described. These techniques are able to rewrite complex queries into queries comprised of small-sized star-shaped sub-queries; optimized queries not only are able to reduce execution time, but they can benefit from caching data during query execution. These plans can speed up execution time by up to three orders of magnitude, while original queries may exhibit poor performance. Second, I will describe ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions when data is remotely accessed. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traffic is bursty, and opportunistically, the operators produce results as quickly as data arrives from the endpoints. ANAPSID performance will be compared with respect to RDF stores and endpoints; experimental results will show that ANAPSID can speed up execution time, in some cases, in more than one order of magnitude.
Maria-Esther Vidal is a Full Professor of the Computer Science Department at the University Simón Bolívar, Caracas, Venezuela. She is part of the DataBase group and conducts her research on query rewriting, optimization and evaluation in emerging technologies. She has been an assistant researcher at the Institute of Advanced Computer Studies at the University of Maryland (UMIACS) and a Visitor Professor at Universidad Politecnica de Catalayna and Universidad de la Laguna.