Melnik, S. ; Garcia-Molina, H. ; Rahm, E.

Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching

Proc. 18th International Conference on Data Engineering (ICDE), San Jose, 2002




Matching elements of two data schemas or two data instances
plays a key role in data warehousing, e-business,
or even biochemical applications. In this paper we present
a matching algorithm based on a fixpoint computation that
is usable across different scenarios. The algorithm takes
two graphs (schemas, catalogs, or other data structures)as
input, and produces as output a mapping between corresponding
nodes of the graphs. Depending on the matching
goal, a subset of the mapping is chosen using filters. After
our algorithm runs, we expect a human to check and if necessary
adjust the results. As a matter of fact, we evaluate
the ‘accuracy’ of the algorithm by counting the number of
needed adjustments. We conducted a user study, in which
our accuracy metric was used to estimate the labor savings
that the users could obtain by utilizing our algorithm
to obtain an initial matching. Finally, we illustrate how our
matching algorithm is deployed as one of several high-level
operators in an implemented testbed for managing information
models and mappings.