Interactive Visualization of Large Similarity Graphs and Entity Resolution Clusters
Proc. EDBT 2018
Entity Resolution (ER) identifies semantically equivalent entities, e.g. describing the same product or customer. It is a crucial and challenging step when integrating heterogeneous (big) data sources. ER approaches typically compute a similarity graph where vertices represent entities and edges (links) connect sim-ilar entities. Different clustering algorithms can be applied on such similarity graphs to finally determine groups of matching entities. In this demonstration paper, we introduce a new interactive tool to visualize and thus help to analyze large similarity graphs and large sets of ER clusters. Users can intuitively investi-gate the link and cluster structure to identify potential problems such as overly large clusters, cluster overlaps or singletons that might indicate the need for repair activities on the ER result. To support large graphs, computation-intensive tasks like layouting and sampling are executed on the server side as parallel or serial processes. The demo walks through different matching and clustering tasks and allows users to interactively explore the results.