Evaluating Instance-based Matching of Web Directories


Massmann, S. ; Rahm, E.
11th International Workshop on the Web and Databases (WebDB 2008)


Web directories such as Yahoo or Google Directory semantically
categorize many websites and are heavily used to find relevant
websites in a particular domain of interest. Mappings between
different web directories can be useful to integrate the information
of different directories and to improve query and search results.
The creation of such mappings is a challenging match task due to
the large size and heterogeneity of web directories. Our study
evaluates to what degree current match technology can be used to
automatically determine directory mappings. We further propose
specific instance-based match techniques utilizing the URL, name
and description of the categorized websites. We evaluate the
instance-based approaches for different similarity measures and
study their combination with metadata-based approaches.

