Köpcke, H. ; Thor, A. ; Thomas, S. ; Rahm, E.

Tailoring entity resolution for matching product offers

Proc. 15th Intl. Conf. on Extending Database Technology (EDBT), 2012, pp. 545-550

2012 / 03

Paper

Abstract

Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but di fferent products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We propose the use of tailored approaches for product matching based on a preprocessing of product o ffers to extract and clean new attributes usable for matching. In particular, we propose a new approach to extract and use product codes to identify products and distinguish them from similar product variations. We evaluate the effectiveness of the proposed approaches with challenging real-life datasets with product off ers from online shops. We also show that the UPC information in product off ers is often error-prone and can lead to insufficient match decisions.