Köpcke, H. ; Thor, A. ; Rahm, E.

Learning-based approaches for matching web data entities

IEEE Internet Computing 14(4), 2010

2010 / 07

Paper

Futher information: http://doi.ieeecomputersociety.org/10.1109/MIC.2010.58

Abstract

Entity matching is a key task for data integration and especially challenging for web data. Effective entity matching typically re-quires the combination of several match techniques and finding suitable configuration parameters such as similarity thresholds. We investigate to which degree the use of machine learning helps to semi-automatically determine suitable match strategies with a limited amount of manual effort for training. We use a new framework, FEVER, to evaluate several learning-based approaches for matching different sets of web data entities. In particular, we study different approaches to select training data and study how much training is needed to find effective combined match strategies and their configuration.