Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore developing a new end-to-end approach for graph data management and analysis at the Big Data center of excellence ScaDS Dresden/Leipzig. The system is called Gradoop (Graph analytics on Hadoop). Gradoop is designed around the so-called Extended Property Graph Data Model (EPGM) which supports semantically rich, schema-free graph data within many distinct graphs. A set of high-level operators is provided for analyzing both single graphs and sets of graphs. The operators are usable within a domain-specific language to define and run data integration workflows (for integrating heterogeneous source data into the Gradoop graph store) as well as analysis workflows. The Gradoop data store is currently utilizing HBase for distributed storage of graph data in Hadoop clusters. An initial version of Gradoop is operational and has been used for analyzing graph data for business intelligence and social network analysis.
<p>
<a = href="file/GvDB2015-slides-Gradoop.pdf">Keynote slides</a>