Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Interactive Visualization of Large Similarity Graphs and Entity Resolution Clusters

Breadcrumb

  • Home
  • Interactive Visualization of Large Similarity Graphs and Entity Resolution Clusters

Rostami, M. ; Saeedi, A. ; Peukert, E. ; Rahm, E.

Interactive Visualization of Large Similarity Graphs and Entity Resolution Clusters

Proc. EDBT 2018

2018 / 03

Andere

Abstract

Entity Resolution (ER) identifies semantically equivalent entities, e.g. describing the same product or customer. It is a crucial and challenging step when integrating heterogeneous (big) data sources. ER approaches typically compute a similarity graph where vertices represent entities and edges (links) connect sim-ilar entities. Different clustering algorithms can be applied on such similarity graphs to finally determine groups of matching entities. In this demonstration paper, we introduce a new interactive tool to visualize and thus help to analyze large similarity graphs and large sets of ER clusters. Users can intuitively investi-gate the link and cluster structure to identify potential problems such as overly large clusters, cluster overlaps or singletons that might indicate the need for repair activities on the ER result. To support large graphs, computation-intensive tasks like layouting and sampling are executed on the server side as parallel or serial processes. The demo walks through different matching and clustering tasks and allows users to interactively explore the results.

Recent publications

  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage
  • 2025 / 5: Federated Learning With Individualized Privacy Through Client Sampling
  • 2025 / 3: Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning
  • 2025 / 3: Automated Configuration of Schema Matching Tools: A Reinforcement Learning Approach

Footer menu

  • Directions
  • Contact
  • Impressum