German English

GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Processing highly connected data as graphs becomes more and more important in many different domains. Prominent examples are social networks, e.g. facebook and Twitter, as well as information networks like the World Wide Web or biological networks. One important similarity of these domain specific data is their inherent graph structure which makes them eligible for analytics using graph algorithms. Besides that, the datasets share two more similarities: they are huge in size, making it hard or even impossible to process them on a single machine and they grow over time, which classifies them as dynamic graphs. With the objective of analyzing these large-scale, dynamic datasets, we started developing a framework called “Gradoop” (Graph Analytics on Hadoop®) with the following three main objectives:

  1. developing a graph data model incl. operators for the definition of analytical pipelines
  2. data integration of heterogeneous source systems into an integrated graph and
  3. efficient data distribution / replication to optimize the execution of distributed graph operators.

Our prototype is build on top of the distributed dataflow framework Apache Flink™ and the NoSQL database Apache HBase™. The data model has been designed and the operators have been implemented. A first use case is the BIIIG project for graph analytics in business information networks. In our ongoing work, we will look into different methods of operator tuning depending on the underlying dataflow system.

People

Research

Students

  • Kevin Gomez
  • Niklas Teichmann
  • Stephan Kemper

Awards

Source code

GitHub

Cooperation


Competence Center for Scalable Data Services and Solutions (ScaDS)

Talks

DateTalkEventLanguage
Feb 2017Skalierbare Graph-basierte Analyse und Business Intelligencebitkom Big Data Summit 2017de
Feb 2017Distributed Graph Analytics with GRADOOPLDBC TUC Meeting en
Feb 2017 Distributed Graph Flows: Cypher on Flink and GRADOOPopenCypher Implementers Meetingen
Feb 2017(Cypher)-[:ON]->(ApacheFlink)<-[:USING]-(Gradoop)FOSDEM 2017 Graph Devroomen
Feb 2017From Shopping Baskets to Structural PatternsFOSDEM 2017 Graph Devroomen
Nov 2016Scalable Graph Data Analytics with GRADOOPBBDC Symposiumen
Oct 2016Gut vernetzt: Skalierbares Graph Mining für Business Intelligencedata2day 2016de
Jul 2016Distributed Graph Analytics with GRADOOPLet’s talk about Graph Databasesen
Mar 2016GRADOOP - Scalable Graph Analytics with Apache FlinkGraph Fun with Apache Flink & Neo4jen
Feb 2016GRADOOP - Scalable Graph Analytics with Apache FlinkFOSDEM 2016 Graph Processing Devroomen
Dec 2015GRADOOP - Scalable Graph Analytics with Apache Flink Meetup Big Data User Group Dresdenen
Oct 2015GRADOOP - Scalable Graph Analytics with Apache FlinkFlinkForward 2015en
Jul 2015Scalable Graph Analytics with GRADOOP and BIIIGGraph Sync Meeting @ScaDS Dresdenen
Jun 2015Scalable Graph Analytics with GRADOOP and BIIIGMicrosoft Meeting @ScaDS Dresdenen
Jun 2015Scalable Graph Data Management and Analytics with GRADOOPProject Meeting @ScaDS Leipzigen
May 2015Scalable Graph Analytics with GRADOOPKeynote GvDB-Workshopen

Publications



Google Scholar
publication iconPetermann, A.; Micale G.; Bergami G.; Junghanns, M.; Pulvirenti, A.; Rahm, E.;
Mining and Ranking of Generalized Multi-Dimensional Frequent Subgraphs
Proc. International Conference on Digital Information Management (ICDIM) 2017
2017-09
PDF

Google Scholar
publication iconSaeedi, Alieh; Peukert, Eric; Rahm, Erhard
Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution
Proc. ADBIS, LNCS
2017-09
PDF

Google Scholar
publication iconJunghanns, M.; Kießling, M.; Averbuch, A.,; Petermann, A.; Rahm, E.
Cypher-based Graph Pattern Matching in Gradoop
Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES)
2017-05
PDF

Google Scholar
publication iconJunghanns, M.; Petermann, A.; Rahm, E.;
Distributed Grouping of Property Graphs with GRADOOP
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017
2017-03
PDF

Google Scholar
publication iconJunghanns, M.; Petermann, A.; Teichmann, N.; Rahm, E.;
The Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Demo paper)
2017-03

PDF
Google Scholar
publication iconKemper, S.; Petermann, A.; Junghanns, M.
Distributed FoodBroker: Skalierbare Generierung graphbasierter Geschäftsprozessdaten.
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Workshops)
2017-03

further information
Google Scholar
publication iconPetermann, A.; Junghanns, M.; Rahm, E.;
DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems
arXiv
2017-03
PDF

Google Scholar
Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E.;
Graph Mining for Complex Data Analytics
Proc. ICDM 2016 (Demo paper)
2016-12

PDF
Google Scholar
publication iconJunghanns, M.; Petermann, A.
Verteilte Graphanalyse mit Gradoop
JavaSPEKTRUM 05/2016
2016-10
PDF

Google Scholar
Petermann, A.; Junghanns, M.
Scalable Business Intelligence with Graph Collections
it - Information Technology, Special Issue: Big Data Analytics, Vol. 58 (4), 2016, pp. 166–175
2016-08
PDF

Google Scholar
publication iconJunghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E.
Analyzing Extended Property Graphs with Apache Flink
Proc. Int. SIGMOD workshop on Network Data Analytics (NDA)
2016-07
PDF

Google Scholar
Junghanns, M.; Petermann, A.; Gomez, K.; Rahm, E.
GRADOOP: Scalable Graph Data Management and Analytics with Hadoop
Techn. Report, Univ. of Leipzig, arXiv:1506.00548, June 2015
2015-06
PDF

Google Scholar
publication iconRahm, Erhard
Scalable graph analytics with GRADOOP
Proc. GI-Workshop Grundlagen von Datenbanksystemen (GvDB), Gommern, May 2015 (Invited Talk)
2015-05
PDF

Google Scholar
Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
Graph-based Data Integration and Business Intelligence with BIIIG
Proc. VLDB Conf., 2014 (Demo paper)
2014-09
PDF

Google Scholar
Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics
5th Workshop on Big Data Benchmarking (WBDB 2014), LNCS 8991, 2015
2014-08
PDF

Google Scholar
publication iconPetermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
BIIIG : Enabling Business Intelligence with Integrated Instance Graphs
5th International Workshop on Graph Data Management (GDM 2014)
2014-03

Disclaimer

Apache®, Hadoop® Apache Flink™ and Apache HBase™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.