German English

COMA++


What others say about COMA/COMA++:

“COMA++ is one of the best available schema matchers that enjoys from combining several available methods for schema matching” [Nezhad et al., WWW 2007]

“…the COMA system … was the first to clearly articulate and embody the multi-component architecture…” [Lee et al., VLDB Journal 2007]

“The most complete tool”. [Manakanatas et al., DISWEB 2006]

“COMA with the NamePath+Leaves matcher combination is the fastest prototype in our evaluation.” [Yatskevich, Technical Report 2003]


Schema and Ontology Matching with COMA++

illustration coma type live

This page gives an overview about the schema matching systems COMA++ and COMA developed at the University of Leipzig. Please consult the relevant papers (below) for a more detailed discussion of our approach. Download COMA++-Prototype or try out the COMA++ Web Edition (here or here.

Figure 1. User Interface of COMA++

Introduction

Schema and ontology matching aim at identifying semantic correspondences between metadata structures or models such as database schemas, XML message formats, and ontologies. Solving such match problems are of key importance to service interoperability and data integration in numerous application domains. The goal is to keep manual effort low.

COMA++ is a schema and ontology matching tool. It extends our previous prototype COMA utilizing a composite approach to combine different match algorithms. Furthermore, it offers a comprehensive infrastructure to solve large real-world match problems. The graphical interface offers a variety of interactions, allowing the user to influence in the match process in many ways. COMA++ functionality is used within the new QuickMig prototype focussing on the generation of executable mappings for data migration.

Architecture

The GUI provides access to the five main parts of COMA++, the Repository to persistently store all match-related data, the Model and Mapping Pools to manage schemas, ontologies and mappings in memory, the Match Customizer to configure matchers and match strategies, and the Execution Engine to perform match operations. Automatic match processing is performed in the Execution Engine in the form of match iterations, which uniformly take place in three steps, component identification to determine the relevant schema components for matching, matcher execution applying multiple matchers to compute component similarities, and similarity combination to combine matcher-specific similarities and derive the correspondences between the components. The obtained mapping can be used as input in the next iteration for further refinement. Each iteration can be individually configured using the alternatives supported by the Match Customizer, i.e. the types of components to be considered, the matchers for similarity computation, and the strategies for similarity combination.

Figure 2. System Architecture of COMA++

Figure 3. Match Processing in COMA++

Model Support

Using a generic data representation, COMA++ uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema (XSD) and Web Ontology Language (OWL). Further formats supported by COMA++ include XML Data Reduced (XDR) and relational schemas.

  • XSD Support: COMA++ supports very large schemas that are distributed over a multitude of XSD documents and that span various namespaces.
  • OWL Support: COMA++ currently supports matching between ontologies written in W3C OWL-Lite. OWL class hierarchies and relationship types are read in via the OWL API and mapped to the generic model representation based on directed acyclic graphs.

Matchers and Match Strategies

COMA++ supports a comprehensive and extensible library of individual matchers, which can be selected to perform a match operation. Using the GUI, it is easy to construct new, more powerful, matchers by combining existing ones. Moreover, it is possible to specify match strategies as workflows of multiple match steps, allowing to divide and successively solve complex match tasks in multiple stages. Due to the flexiblity to configure matchers and match strategies, COMA++ cannot only be used to solve match problems but also to comparatively evaluate the effectiveness of different match algorithms.

COMA++ supports  new approaches for ontology matching, in particular the utilization of shared taxonomies, by means of a so-called Taxonomy Matcher. To illustrate the taxonomy matcher consider two beer ontologies to be matched. Suppose the first model contains an element called Weizen, the second an element Kölsch. Both of these elements represent types of (German) beer, but they do not share any lexical similarity. The taxonomy matcher draws on the given beer taxonomy to deduce whether two elements are related semantically. In the current example, both Weizen and Kölsch are hyponyms of top fermented beer. That is, they share the same hypernym and the matcher assigns a similarity value dependent on the distance of the two terms within the taxonomy.

Figure 4. Taxonomy Based Matching

Using the flexible infrastructure for combining and refining matcher results, match processing is supported as a workflow of several match steps. We implemented specific workflows (i.e. strategies) for context-dependent, fragment-based, and reuse-oriented matching, respectively:

  • Context-dependent Matching. We address the problem of context-dependent matching, which is necessary for schemas with shared elements. Although required by many applications, such as transformation of XML messages, identifying context-dependent correspondences is mostly ignored by previous work. COMA++ supports several strategies, which are also scalable for large schemas, to obtain context-dependent match results.
  • Fragment-based Matching. To cope with large schemas, COMA++ implements a fragment-based match processing approach. Following the divide-and-conquer idea, it decomposes a large match problem into smaller subproblems by matching at the level of schema fragments. With the reduced problem size, we aim not only at better execution time but also at better match quality compared to schema-level matching.
  • Reuse-oriented Matching. We pursue the reuse of previously determined match results. The main mechanism for our approach is a MatchCompose operation, which performs a join-like operation on a mapping path consisting of two or more mappings, such as A-B, B-C, and C-D, successively sharing a common schema, to derive a new mapping between A and D.

Publications

PDF

Google Scholar
publication iconPeukert, Eric; Berthold, Henrike; Rahm, Erhard
Rewrite Techniques for Performance Optimization of Schema Matching Processes
13th International Conference on Extending Database Technology, EDBT 2010
2010-01
PDF
further information
Google Scholar
publication iconMassmann, S. ; Rahm, E.
Evaluating Instance-based Matching of Web Directories
11th International Workshop on the Web and Databases (WebDB 2008)
2008-06
PDF

Google Scholar
publication iconDrumm, C.; Schmitt, M.; Do, H.-H.; Rahm, E.
QuickMig - Automatic Schema Matching for Data Migration Projects
Proc. ACM CIKM, Lisabon, Nov. 2007
2007-11 [20 citations]
PDF

Google Scholar
Engmann, D.; Massmann, S.
Instance Matching with COMA++
BTW 2007 Workshop: Model Management und Metadaten-Verwaltung
2007-03 [14 citations]

PDF
Google Scholar
Do, H.-H.; Rahm, E.
Matching Large Schemas: Approaches and Evaluation
Information Systems, Volume 32, Issue 6, September 2007, Pages 857-885
2007 [65 citations]

further information
Google Scholar
publication iconMassmann, S.; Engmann, D.; Rahm, E.
COMA++: Results for the Ontology Alignment Contest OAEI 2006
International Workshop on Ontology Matching, collocated with the 5th ISWC-2006; Athens, Georgia, USA
2006-11 [13 citations]

further information
Google Scholar
Do, Hai Hong
Schema Matching and Mapping-based Data Integration
Dissertation. Veröffentlich durch Verlag Dr. Müller (VDM), ISBN 3-86550-997-5,
2006
PDF
further information
Google Scholar
Aumueller, D.; Do, H.H.; Massmann, S.; Rahm, E.
Schema and ontology matching with COMA++
SIGMOD Conference
2005-06 [205 citations]
PDF

Google Scholar
Rahm, E.; Do, H.H.; Massmann, S.
Matching Large XML Schemas
Sigmod Record 33(4)
2004-12 [71 citations]
PDF

Google Scholar
publication iconDo, H.H.; Melnik, S.; Rahm, E.
Comparison of Schema Matching Evaluations
Proc. Workshop Web and Databases, LNCS 2593, 2003
2003 [254 citations]
PDF
further information
Google Scholar
Do, H.H.; Rahm, E.
COMA - A System for Flexible Combination of Schema Matching Approaches
Proc. 28th Intl. Conference on Very Large Databases (VLDB), Hongkong, Aug. 2002
2002 [589 citations]

Downloads

Contact/Project Members