Rost, C.

Scalable Management and Analysis of Temporal Property Graphs

Universität Leipzig

2024 / 05

Dissertation

Futher information: https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-915249

Abstract

Graphs, as simple yet powerful data structures, play a pivotal role in modeling and analyzing relationships among real-world entities. In the data representation and analysis landscape, graph data structures have established themselves as a fundamental paradigm for modeling and understanding complex relationships in various domains. The intrinsic domain independence, expressiveness, and the wide variety of analysis options based on graph theory have gained significant attention in both research and industry. In recent years, companies have increasingly leveraged graph technology to represent, store, query, and analyze graph-shaped data. This has been notably impactful in uncovering hidden patterns and predicting relationships within diverse domains such as social networks, Internet of Things (IoT), biological systems, and medical networks. However, the dynamic nature of most real-world graphs is often neglected in existing approaches, which might lead to inaccurate analytical results or an incomplete understanding of evolving patterns within the graph over time. Temporal graphs, in contrast, are a particular type of graphs that maintain changing structures and properties over time. They have gained significant attention in various domains, from financial networks over micromobility networks to supply chains and biological networks. A majority of these real-world networks are not static but rather exhibit high dynamics, which are rarely considered in data models, query languages, and analyses, although analytical questions often require an evaluation of the network's evolution. This doctoral thesis addresses this critical gap by presenting a comprehensive study on analyzing and exploring temporal property graphs. It focuses on scalability and proposes novel methodologies to enhance accuracy and comprehensiveness in analyzing evolving graph patterns over time. It also offers insights into real-time querying, addressing various challenges that emerge when the time dimension is treated as an integral part of the graph.

This thesis introduces the Temporal Property Graph Model (TPGM), a sophisticated data model designed for bitemporal modeling of vertices and edges, as well as logical abstractions of subgraphs and graph collections. The reference implementation of this model, namely Gradoop, is a graph dataflow system explicitly designed for scalable and distributed analysis of static and temporal property graphs. Gradoop empowers analysts to construct comprehensive and flexible temporal graph processing workflows through a declarative analytical language. The system supports various analytical temporal graph operators, such as snapshot retrieval, temporal graph pattern matching, time-dependent grouping, and temporal metrics such as degree evolution. The thesis provides an in-depth analysis of the data model, system architecture, and implementation details of Gradoop and its operators.  Comprehensive performance evaluations have been conducted on large real-world and synthetic temporal graphs, providing valuable insights into the system's capabilities and efficiency.

Furthermore, this thesis demonstrates the flexibility of the temporal graph model and its operators through a practical use case based on a call center network. In this scenario, a TPGM operator pipeline is developed to answer a complex and time-dependent analytical question. We also showcase the Temporal Graph Explorer (TGE), a web-based user interface designed to explore temporal graphs, leveraging Gradoop as a backend. The TGE empowers users to delve into temporal graph dynamics by enabling the retrieval of snapshots from the graph's past states, computing differences between these snapshots, and providing temporal summaries of graphs. This functionality allows for a comprehensive understanding of graph evolution through diverse visualizations. Real-world temporal graph data from bicycle rentals highlight the system's flexibility and configurability of the selected temporal operators.

The impact of graph changes on its characteristics can also be explored by examining centrality measures over time. Centrality measures, encompassing both node and graph metrics, quantify the characteristics of individual nodes or the entire graph. In the dynamic context of temporal graphs, where the graph changes over time, node and graph metrics also undergo implicit changes. This thesis tackles the challenge of adapting static node and graph metrics to temporal graphs. It proposes temporal extensions for selected degree-dependent metrics and aggregations, emphasizing the importance of including the time dimension in the metrics. This thesis demonstrates that a metric conventionally representing a scalar value for static graphs results in a time series when applied to temporal graphs. It introduces a baseline algorithm for calculating the degree evolution of vertices within a temporal graph, and its practical implementation in Gradoop is presented. The scalability of this algorithm is evaluated using both real-world and synthetic datasets, providing valuable insights into its performance across diverse scenarios.

Such time series data can also be captured from the application scenario as properties of nodes and edges, such as sensor readings in the IoT domain. In light of this, we showcase significant advancements, including an extended version of the TPGM that supports time series data in temporal graphs. Additionally, we introduce a temporal graph query language based on Oracle's language PGQL to facilitate seamless querying of time-oriented graph structures. Furthermore, we present a novel continuous graph-based event detection approach to support scenarios involving more time-sensitive use cases.

The increasing frequency of graph changes and the need to react quickly to emerging patterns leads to the need for a unified declarative graph query language that can evaluate queries on graph streams. To address the increasing importance of real-time data analysis and management, the thesis introduces the syntax and semantics of Seraph, a Cypher-based language that supports native streaming features within property graph query languages. The semantics of Seraph combine stream processing with property graphs and time-varying relations, treating time as a first-class citizen. Real-world industrial use cases demonstrate the usage of Seraph for graph-based continuous queries.

After evaluating lessons learned from the development and research on Gradoop, a dissertation summary and an outlook on future work are given in a final section. This doctoral thesis comprehensively examines scalable analysis and exploration techniques for temporal property graphs, focusing on Gradoop and its system architecture, data model, operators, and performance evaluations. It also explores the evolution of node and graph metrics and the theoretical foundation of a real-time query language, contributing to the advancement of temporal graph analysis in various domains.