<p style="text-align:justify;">
The use of the MapReduce framework for iterative graph algorithms is challenging. To achieve high performance it is critical to limit the amount of intermediate results as well as the number of necessary iterations. We address these issues for the important problem of finding connected components in large graphs. We analyze an existing MapReduce algorithm, CCMR, and present techniques to improve its performance including a memory-based connection of subgraphs in the map phase. Our evaluation with several large graph datasets shows that the improvements can substantially reduce the amount of generated data by up to a factor of 8.8 and runtime by up to factor of 3.5.
</p>
<h2>Keywords</h2>
<ul>
<li>MapReduce</li>
<li>Hadoop</li>
<li>Connected Graph Components</li>
<li>Transitive Closure</li>
</ul>
<h2 id="bibtex_heading">BibTex</h2>
<pre id="bibtex_listing">
@article{DBLP:journals/dbsk/KolbSR14,
author = {Lars Kolb and Ziad Sehili and Erhard Rahm},
title = {{Iterative Computation of Connected Graph Components with MapReduce}},
journal = {Datenbank-Spektrum},
volume = {14},
number = {2},
year = {2014},
pages = {107-117},
ee = {http://dx.doi.org/10.1007/s13222-014-0154-1},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
</pre>