Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Incremental Clustering on Linked Data

Breadcrumb

  • Home
  • Research
  • Publications
  • Incremental Clustering on Linked Data

Nentwig, M. ; Rahm, E.

Incremental Clustering on Linked Data

Proc. IEEE International Conference on Data Mining Workshop, ICDMW 2018, Singapore

2018 / 11

Paper

Abstract

Data integration in the Web of Data is not limited to the pairwise linking of entities but often requires to cluster entities of different sources, e. g., within knowledge graphs. Such entity clustering should not only be scalable to large data volumes and many sources but also be dynamic to deal with continuously changing sources and the ability to incorporate new sources. Previous entity clustering approaches are mostly static focusing on the one-time linking and clustering of entities from few sources. In this paper, we propose and evaluate new scalable approaches for incremental entity clustering that support the continuous addition of new entities and data sources. The implementation is based on the distributed processing framework Apache Flink. A detailed performance evaluation with real and synthetically customized datasets shows the effectiveness and scalability of the incremental clustering approaches.

Recent publications

  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage
  • 2025 / 5: Federated Learning With Individualized Privacy Through Client Sampling
  • 2025 / 3: Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning
  • 2025 / 3: Automated Configuration of Schema Matching Tools: A Reinforcement Learning Approach

Footer menu

  • Directions
  • Contact
  • Impressum