COMA++
What others say about COMA/COMA++
“COMA++ is a generic, composite matcher with very effective match results.” [Duchateau et al., OTM 2008]
“COMA++ is one of the best available schema matchers that enjoys from combining several available methods for schema matching” [Nezhad et al., WWW 2007]
“The best recall and the best F-measure were achieved by COMA++.” [Kappel et al., BTW workshop 2007]
“…the COMA system … was the first to clearly articulate and embody the multi-component architecture…” [Lee et al., VLDB Journal 2007]
“The most complete tool”. [Manakanatas et al., DISWEB 2006]
“COMA is the first work to address engineering issues of a schema matching system.” [Bernstein et al., Sigmod Record 2004]
“COMA with the NamePath+Leaves matcher combination is the fastest prototype in our evaluation.” [Yatskevich, Technical Report 2003]
Schema and Ontology Matching with COMA++
This page gives an overview about the schema matching systems COMA++ and COMA developed at the University of Leipzig. Please consult the relevant papers (below) for a more detailed discussion of our approach. Download COMA++-Prototype or try out the COMA++ Web Edition.
Figure 1. User Interface of COMA++ (Version 2009)
Introduction
Schema and ontology matching aim at identifying semantic correspondences between metadata structures or models such as database schemas, XML message formats, and ontologies. Solving such match problems are of key importance to service interoperability and data integration in numerous application domains. The goal is to keep manual effort low.
COMA++ is a schema and ontology matching tool. It extends our previous prototype COMA utilizing a composite approach to combine different match algorithms. Furthermore, it offers a comprehensive infrastructure to solve large real-world match problems. The graphical interface offers a variety of interactions, allowing the user to influence in the match process in many ways. COMA++ functionality is used within the new QuickMig prototype focussing on the generation of executable mappings for data migration.
Architecture
The GUI provides access to the five main parts of COMA++, the Repository to persistently store all match-related data, the Model and Mapping Pools to manage schemas, ontologies and mappings in memory, the Match Customizer to configure matchers and match strategies, and the Execution Engine to perform match operations. Automatic match processing is performed in the Execution Engine in the form of match iterations, which uniformly take place in three steps, component identification to determine the relevant schema components for matching, matcher execution applying multiple matchers to compute component similarities, and similarity combination to combine matcher-specific similarities and derive the correspondences between the components. The obtained mapping can be used as input in the next iteration for further refinement. Each iteration can be individually configured using the alternatives supported by the Match Customizer, i.e. the types of components to be considered, the matchers for similarity computation, and the strategies for similarity combination.

Figure 2. System Architecture of COMA++

Figure 3. Match Processing in COMA++
Model Support
Using a generic data representation, COMA++ uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema (XSD) and Web Ontology Language (OWL). Further formats supported by COMA++ include XML Data Reduced (XDR) and relational schemas.
- XSD Support: COMA++ supports very large schemas that are distributed over a multitude of XSD documents and that span various namespaces.
- OWL Support: COMA++ currently supports matching between ontologies written in W3C OWL-Lite. OWL class hierarchies and relationship types are read in via the OWL API and mapped to the generic model representation based on directed acyclic graphs.
Matchers and Match Strategies
COMA++ supports a comprehensive and extensible library of individual matchers, which can be selected to perform a match operation. Using the GUI, it is easy to construct new, more powerful, matchers by combining existing ones. Moreover, it is possible to specify match strategies as workflows of multiple match steps, allowing to divide and successively solve complex match tasks in multiple stages. Due to the flexiblity to configure matchers and match strategies, COMA++ cannot only be used to solve match problems but also to comparatively evaluate the effectiveness of different match algorithms.
Using the flexible infrastructure for combining and refining matcher results, match processing is supported as a workflow of several match steps. We implemented specific workflows (i.e. strategies) for context-dependent, fragment-based, and reuse-oriented matching, respectively:
- Context-dependent Matching. We address the problem of context-dependent matching, which is necessary for schemas with shared elements. Although required by many applications, such as transformation of XML messages, identifying context-dependent correspondences is mostly ignored by previous work. COMA++ supports several strategies, which are also scalable for large schemas, to obtain context-dependent match results.
- Fragment-based Matching. To cope with large schemas, COMA++ implements a fragment-based match processing approach. Following the divide-and-conquer idea, it decomposes a large match problem into smaller subproblems by matching at the level of schema fragments. With the reduced problem size, we aim not only at better execution time but also at better match quality compared to schema-level matching.
- Reuse-oriented Matching. We pursue the reuse of previously determined match results. The main mechanism for our approach is a MatchCompose operation, which performs a join-like operation on a mapping path consisting of two or more mappings, such as A-B, B-C, and C-D, successively sharing a common schema, to derive a new mapping between A and D.
Benchmark
In order to compare schema and ontology matchers with COMA++, a couple of mapping scenarios can be downloaded here.
Publications
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
|
Downloads
Contact/Project Members
- Prof. Dr. Erhard Rahm
- Patrick Arnold
- Hong-Hai Do
- David Aumüller


















