Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013

2013 / 06

Paper

Abstract

<p style="text-align:justify;">
To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing MapReduce implementations. We evaluate the approach on a real cloud infrastructure and show its effectiveness for all degrees of redundancy.
</p>

<h2>Keywords</h2>
<ul>
<li>MapReduce</li>
<li>Hadoop</li>
<li>Pairwise similarity computation</li>
<li>Redundancy</li>
<li>Overlapping clustering</li>
</ul>

<h2 id="bibtex_heading">BibTex</h2>
<pre id="bibtex_listing">
@inproceedings{Kolb:2013:DMT:2486767.2486768,
author = {Kolb, Lars and Thor, Andreas and Rahm, Erhard},
title = {{Don't Match Twice: Redundancy-free Similarity Computation with MapReduce}},
booktitle = {Proceedings of the Second Workshop on Data Analytics in the Cloud},
series = {DanaC '13},
year = {2013},
pages = {1--5},
url = {http://doi.acm.org/10.1145/2486767.2486768}
}
</pre>

Database Group Leipzig

within the department of computer science

Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

Abstract

Recent publications