Parallel Database Systems
Parallel Database Systems
Supported by![]()
Project Overview
Project Members
Selected Publications
Previous research projects on PDBS:
- Early work on Shared Disk PDBS
- Parallel query processing / Dynamic load balancing (1992-96)
- Goal-oriented performance control
- Extended storage hierarchies
Project Overview
Parallel database systems (PDBS) support various types of parallelism both between and within queries (inter- and intra-query parallelism). These are supposed to provide short response times for single, complex queries on large amounts of data as well as high throughput of shorter transactions. An efficient exploitation of parallelism requires sophisticated load balancing techniques to distribute the workload across the resources (CPUs, disks, main memory, and network) of a parallel system. Load balancing must be supported by adequate data allocation methods that spread the base data across a system's disks so as to enable efficient, parallel I/O. Solutions to these problems have to consider the system architecture (Shared Disk, Shared Nothing, Shared Everything, hybrids) as well as workload and database characteristics (multi-user processing, skew effects). Detailed performance studies are conducted to evaluate newly developed approaches.The project is supported by the Deutsche Forschungsgemeinschaft (DFG), Jan. 1997 - Oct. 2001.
Load Balancing
In load balancing, we concentrate on dynamic strategies that partition and distribute workload at runtime based on the current system state. For Shared-Disk systems, we have proposed so-called on-demand techniques which allocate load units to processors 'on the fly' as execution progresses. So far, we have applied this paradigm to scan and join operators. Performance results indicate better performance for this approach compared to predictive schemes that attempt to plan query execution in advance but suffer from inaccurate load estimates especially in multi-user mode.
Data Allocation
Our work on data allocation comprises both base data and access structures (indices), as well as intermediate results of large queries. We also consider data warehousing structures such as star schemata, bitmap indices, and object-relational data. Since data allocation predefines the units of load balancing (a data fragment is normally processed by a single CPU only), it must correspond to the expected workload profile to enable effective parallelization later on.We developed a GUI-equipped data allocation tool, Warlock (Warehouse allocation to disk), that facilitates and automates the complex data allocation task for Data Warehouse environments. It is based on a multi-dimensional fragmentation and allocation approach for relational star schemas. Warlock proposes an allocation scheme that optimizes both response time and throughput behaviour of parallel star queries by means of an analytical cost model. It is easy to parametrize (e.g. by a graphical schema and query generator, few database-, hardware- and tuning parameters) and provides detailed fragmentation and query performance statistics as well as a visualization of the proposed allocation scheme. We demonstrated Warlock at the VLDB Conference 2001 in Rome, Italy (poster, demo).
Parallel object-relational DBS
In a recent study, we have started investigating data allocation and parallel reference-based join processing in object-relational DBS. One the one hand, references are defined explicitly by the user to model arbitrary relationships. One the other hand, they occur implicitly in decomposed storage structures, where they represent class hierarchies or link to detached (often set-valued or just very large) attributes. Either way, reference traversal is a frequent operation in the object-relation data model, and we have developed an allocation scheme that supports the resulting access pattern very well. Based on this, we have compared relational join methods (hash and sort-merge) to object-oriented pointer chasing techniques to find the appropriate strategies for different types of queries.
Performance Evaluation
Our two main tools of research are analytical models and simulation systems. Especially for data allocations, we use analytical means to estimate, for instance, the amount of I/O for a given query mix and a given allocation scheme. Both allocation and load balancing techniques are evaluated within a comprehensive simulation environment named SimPaD that we have designed, implemented, and gradually expanded over the past few years. In a modular design, it reflects all relevant hardware components (CPUs, disks, main memory, network), data structures (relations and indices, partitioned and allocated across disk devices), query operators (scan, join, aggregate), and subordinate services (lock and buffer management). It can model different system architectures as well as numerous data allocation and load balancing methods.
Project Members
- Dipl.-Inform. Holger Märtens (sponsored by the DFG)
- Dipl.-Inform. Thomas Stöhr (doctoral student)
Selected Publications
Märtens, H.:Beiträge zur dynamischen Lastbalancierung in parallelen Datenbanksystemen.
Dissertation, Univ. Leipzig, 2008.
Märtens, H.; Rahm, E.; Stöhr, T.:
Dynamic Query Scheduling in Parallel Data Warehouses.
Concurrency and Computation: Practice and Experience. Volume 15, Issue 11-12, Sep. 2003, Pages 1169 - 1190
Märtens, H.; Rahm, E.; Stöhr, T.:
Dynamic Query Scheduling in Parallel Data Warehouses.
Proc. EURO-PAR 2002, Springer-Verlag, LNCS, Paderborn, Aug. 2002
Spruth, W.; Rahm, E.:
Sysplex-Cluster-Technologien für Hochleistungs-Datenbanken.
Datenbank-Spektrum 2(3), Mai 2002
Stöhr, T., Rahm, E.:
Warlock: A Data Allocation Tool for Parallel Warehouses.
Proc. 27th Intl. Conference on Very Large Databases (VLDB), Rome, Italy, Sep. 2001 (software demonstration)
Märtens, H.:
A Classification of Skew Effects in Parallel Database Systems.
Proc. 7th Intl. Euro-Par Conference (Euro-Par 2001),
LNCS, Springer-Verlag, Manchester, August 2001.
Märtens, H., Rahm, E.:
On Parallel Join Processing in Object-Relational Database Systems.
Proc. of BTW01 (Datenbanksysteme für Büro, Technik und Wissenschaft),
Oldenburg, March 2001. Springer-Verlag
Stöhr, T.:
Analytische Bestimmung einer Datenallokation für Parallele Data Warehouses.
Proc. of BTW01 (Datenbanksysteme für Büro, Technik und Wissenschaft),
Oldenburg, March 2001. Springer-Verlag
Stöhr, T.; Märtens, H.; Rahm, E.:
Multi-Dimensional Database Allocation for Parallel Data Warehouses
Proc. 26th Intl. Conference on Very Large Databases (VLDB 2000), Cairo, September 2000.
Rahm, E., Märtens, H., Stöhr, T.:
On Flexible Allocation of Index and Temporary Data in Parallel Database Systems
Proc. 8th Intl. Workshop on High Performance Transaction Systems (HPTS'99), Asilomar, September 1999.
Märtens, H.:
On Disk Allocation of Intermediate Query Results in Parallel Database Systems
Proc. 5th Intl. Euro-Par Conference (Euro-Par'99), LNCS, Springer-Verlag, Toulouse, August/September 1999.
Märtens, H.:
Skew-Insensitive Join Processing in Shared-Disk Database Systems
Proc. Issues and Applications of Database Technology (IADT '98), Berlin, July 1998.
Rahm, E.:
Dynamic Load Balancing in Parallel Database Systems
Proc. 2nd Intl. Euro-Par Conference (Euro-Par'96), LNCS, Springer-Verlag, Lyon, August 1996.
Rahm, E., Marek, R.:
Dynamic Multi-Resource Load Balancing in Parallel Database Systems
Proc. 21th Intl. Conference on Very Large Databases (VLDB '95), Zürich, September 1995.
Rahm, E., Stöhr, T.:
Analysis of Parallel Scan Processing in Shared Disk Database Systems
Proc. Intl. Euro-Par Conference (Euro-Par'95), LNCS, Springer-Verlag, Stockholm, August 1995.
Rahm, E.:
Mehrrechner-Datenbanksysteme. Grundlagen der verteilten und parallelen Datenbankverarbeitung
(Principles of Distributed and Parallel Database Systems, in German)
443 pages, Addison-Wesley, 1994.
Rahm, E., Marek, R.:
Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems
Proc. 19th Intl. Conference on Very Large Databases (VLDB '93), Dublin, August 1993.
Master Thesis:
Lew Bessonow: Simulation objektrelationaler Join-Verfahren in parallelen Datenbanksystemen. August 2000.Last updated: Aug. 2001

