German English


Gradoop: Distributed Graph Analytics on Hadoop

Gradoop is an open source (GPLv3) research framework for scalable graph analytics built on top of Apache Flink™ and Apache HBase™. It offers a graph data model which extends the widespread property graph model by the concept of logical graphs and further provides operators that can be applied on single logical graphs and collections of logical graphs. The combination of these operators allows the flexible, declarative definition of graph analytical workflows. Gradoop can be easily integrated in a workflow which already uses Flink™ operators and Flink™ libraries (i.e. Gelly, ML and Table).



  • Added operator implementation for graph pattern matching (single graph setting, query specification using GDL)
  • Added plugin algorithm for Frequent Subgraph Mining (transactional setting)
  • Added DataSource/DataSink abstraction and various implementations
  • Added multiple examples for analytical programs


  • Major refactoring of internal EPGM representation (e.g. ID and property handling)
  • Added graph grouping operator
  • Added equality operators
  • Added GDL-based Unit Testing


  • Using Apache Flink as execution layer
  • Added basic operators (overlap, exclusion, combination, subgraph, transformation, …)


  • Added support for Apache HBase as distributed graph storage


  • First prototype using Hadoop MapReduce and Apache Giraph for operator processing