COMA++
What others say about COMA/COMA++:
“COMA++ is one of the best available schema matchers that enjoys from combining several available methods for schema matching” [Nezhad et al., WWW 2007]
“…the COMA system … was the first to clearly articulate and embody the multi-component architecture…” [Lee et al., VLDB Journal 2007]
“The most complete tool”. [Manakanatas et al., DISWEB 2006]
“COMA with the NamePath+Leaves matcher combination is the fastest prototype in our evaluation.” [Yatskevich, Technical Report 2003]
Schema and Ontology Matching with COMA++
This page gives an overview about the schema matching systems COMA++ and COMA developed at the University of Leipzig. Please consult the relevant papers (below) for a more detailed discussion of our approach. Download COMA++-Prototype or try out the COMA++ Web Edition (here or here.
Figure 1. User Interface of COMA++
Introduction
Schema and ontology matching aim at identifying semantic correspondences between metadata structures or models such as database schemas, XML message formats, and ontologies. Solving such match problems are of key importance to service interoperability and data integration in numerous application domains. The goal is to keep manual effort low.
COMA++ is a schema and ontology matching tool. It extends our previous prototype COMA utilizing a composite approach to combine different match algorithms. Furthermore, it offers a comprehensive infrastructure to solve large real-world match problems. The graphical interface offers a variety of interactions, allowing the user to influence in the match process in many ways. COMA++ functionality is used within the new QuickMig prototype focussing on the generation of executable mappings for data migration.
Architecture
The GUI provides access to the five main parts of COMA++, the Repository to persistently store all match-related data, the Model and Mapping Pools to manage schemas, ontologies and mappings in memory, the Match Customizer to configure matchers and match strategies, and the Execution Engine to perform match operations. Automatic match processing is performed in the Execution Engine in the form of match iterations, which uniformly take place in three steps, component identification to determine the relevant schema components for matching, matcher execution applying multiple matchers to compute component similarities, and similarity combination to combine matcher-specific similarities and derive the correspondences between the components. The obtained mapping can be used as input in the next iteration for further refinement. Each iteration can be individually configured using the alternatives supported by the Match Customizer, i.e. the types of components to be considered, the matchers for similarity computation, and the strategies for similarity combination.

Figure 2. System Architecture of COMA++

Figure 3. Match Processing in COMA++
Model Support
Using a generic data representation, COMA++ uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema (XSD) and Web Ontology Language (OWL). Further formats supported by COMA++ include XML Data Reduced (XDR) and relational schemas.
- XSD Support: COMA++ supports very large schemas that are distributed over a multitude of XSD documents and that span various namespaces.
- OWL Support: COMA++ currently supports matching between ontologies written in W3C OWL-Lite. OWL class hierarchies and relationship types are read in via the OWL API and mapped to the generic model representation based on directed acyclic graphs.
Matchers and Match Strategies
COMA++ supports a comprehensive and extensible library of individual matchers, which can be selected to perform a match operation. Using the GUI, it is easy to construct new, more powerful, matchers by combining existing ones. Moreover, it is possible to specify match strategies as workflows of multiple match steps, allowing to divide and successively solve complex match tasks in multiple stages. Due to the flexiblity to configure matchers and match strategies, COMA++ cannot only be used to solve match problems but also to comparatively evaluate the effectiveness of different match algorithms.
COMA++ supports new approaches for ontology matching, in particular the utilization of shared taxonomies, by means of a so-called Taxonomy Matcher. To illustrate the taxonomy matcher consider two beer ontologies to be matched. Suppose the first model contains an element called Weizen, the second an element Kölsch. Both of these elements represent types of (German) beer, but they do not share any lexical similarity. The taxonomy matcher draws on the given beer taxonomy to deduce whether two elements are related semantically. In the current example, both Weizen and Kölsch are hyponyms of top fermented beer. That is, they share the same hypernym and the matcher assigns a similarity value dependent on the distance of the two terms within the taxonomy.

Figure 4. Taxonomy Based Matching
Using the flexible infrastructure for combining and refining matcher results, match processing is supported as a workflow of several match steps. We implemented specific workflows (i.e. strategies) for context-dependent, fragment-based, and reuse-oriented matching, respectively:
- Context-dependent Matching. We address the problem of context-dependent matching, which is necessary for schemas with shared elements. Although required by many applications, such as transformation of XML messages, identifying context-dependent correspondences is mostly ignored by previous work. COMA++ supports several strategies, which are also scalable for large schemas, to obtain context-dependent match results.
- Fragment-based Matching. To cope with large schemas, COMA++ implements a fragment-based match processing approach. Following the divide-and-conquer idea, it decomposes a large match problem into smaller subproblems by matching at the level of schema fragments. With the reduced problem size, we aim not only at better execution time but also at better match quality compared to schema-level matching.
- Reuse-oriented Matching. We pursue the reuse of previously determined match results. The main mechanism for our approach is a MatchCompose operation, which performs a join-like operation on a mapping path consisting of two or more mappings, such as A-B, B-C, and C-D, successively sharing a common schema, to derive a new mapping between A and D.
Publications
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
| |||
|
Downloads
Contact/Project Members
- Prof. Dr. Erhard Rahm
- Hong-Hai Do
- David Aumueller
- Sabine Massmann











