Analyzing Extended Property Graphs with Apache Flink
Proc. Int. SIGMOD workshop on Network Data Analytics (NDA)
Graphs are an intuitive way to model complex relationships between real world data objects. Thus, graph analytics plays an important role in research and industry. As graphs are often heterogeneous in terms of reﬂected domain data, their representation requires an expressive data model including the abstraction of graph collections, for example, to analyze communities inside a social network. Further on, answering complex analytical questions about such graphs entails combining multiple analytical operations. To satisfy these requirements, we developed the Extended Property Graph Model. Our model is semantically rich, schema-free and supports multiple distinct graphs. Based on this representation, it provides declarative and combinable operators to analyze both single graphs and graph collections. Our current implementation is based on the distributed dataﬂow framework Apache Flink. We present the results of a ﬁrst experimental study showing the scalability of our implementation on social network data with up to 11 billion edges.