Do, H. ; Rahm, E.

Matching Large Schemas: Approaches and Evaluation

Information Systems, Volume 32, Issue 6, September 2007, Pages 857-885

2007

Paper

Abstract

Current schema matching approaches still have to improve for large and complex schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed Schemas and namespaces. To better assist the user in matching complex Schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas. Keywords: Schema matching; Schema matching evaluation; Data integration http://dx.doi.org/10.1016/j.is.2006.09.002