Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Evaluating Instance-based Matching of Web Directories

Breadcrumb

  • Home
  • Research
  • Publications
  • Evaluating Instance-based Matching of Web Directories

Maßmann, S. ; Rahm, E.

Evaluating Instance-based Matching of Web Directories

11th International Workshop on the Web and Databases (WebDB 2008)

2008 / 06

Paper

Abstract

Web directories such as Yahoo or Google Directory semantically
categorize many websites and are heavily used to find relevant
websites in a particular domain of interest. Mappings between
different web directories can be useful to integrate the information
of different directories and to improve query and search results.
The creation of such mappings is a challenging match task due to
the large size and heterogeneity of web directories. Our study
evaluates to what degree current match technology can be used to
automatically determine directory mappings. We further propose
specific instance-based match techniques utilizing the URL, name
and description of the categorized websites. We evaluate the
instance-based approaches for different similarity measures and
study their combination with metadata-based approaches.

Recent publications

  • 2025 / 9: Generating Semantically Enriched Mobility Data from Travel Diaries
  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage
  • 2025 / 6: Leveraging foundation models and goal-dependent annotations for automated cell confluence assessment
  • 2025 / 5: Federated Learning With Individualized Privacy Through Client Sampling

Footer menu

  • Directions
  • Contact
  • Impressum