German English

Data integration with iFuice

iFuice — information Fusion utilizing instance correspondences and peer mappings

iFuice is a new approach to information fusion of web data which is developed in our group since 2005. It is instance-driven and can utilize peer mappings (e.g., instance corresondences) between independent data sources. Such correspondences are already available between many sources, e.g. in the form of web links and thus support high quality data fusion. Arbitrary sources can be incorporated into iFuice by merely specifying the available data object types and defining a mapping of such a type to one of another data source. By interconnecting mappings between object types from various data sources the information space can be accessed and queried via a script language or adaptively explored in an interactive session. Powerful generic, declarative operators are available to execute and manipulate mappings and their results, e.g. for result fusion (aggregation). Mappings and operators are executable on sets of objects and highly composable thereby supporting a powerful aggregation of information over several sources. Script programs implement data integration workflows which are more flexible than the mere use of queries as in query mediators or federated databases. An extension of iFuice is being developed for mashup-like data integration.

Source and mapping semantics are reflected in a domain model which is at a higher abstraction (ontological) level than a global schema and easier to construct. The domain model conaists of object types (e.g., author, publication) and mapping types (e.g., AuthorOfPublication). Available sources and mappings of the various types of the domain model are reflected in a source mapping model. While physical data sources (PDS) refer to real-world sources, e.g. DBLP, each logical data source (LDS) refers to a particular object type of the domain model. So-called same-mappings interconnect LDS of the same type and associate corresponing isntances which may thus be aggregated to fuse their information. The mappings of the source-mapping graph are executable, e.g. implemented by a query or web service. iFuice allows for explorative data fusion by browsing along these mappings. The execution of several mappings and manipulation of their results can be specified within scripts to allow repeated executions for different input objects.

source mapping model (left) and domain model (right)

Use case

iFuice has been used in different domains, in particular in bioinformatics (BioFuice) and for citation analysis. For example, we may want to have a script determining for a given conference X its most frequently referenced papers, let’s say to determine candidates for a 10 year best paper award. An iFuice representation of such a script could be as follows.

$SIGMODPubs      := queryTraverse (LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs)
$CombinedConfPub := aggregateSame ($SIGMODPubs, GoogleScholar)
$CleanedPubs     := fuseAttributes ($CombinedConfPub)
$Result          := sort ($CleanedPubs, "NoOfCitings")

Informally, it locates conference X in DBLP, executes the PubConf mapping to get all publications of that conference, uses the same-mapping to Google Scholar to get the corresponding publications together with an attribute indicating the number of citations, sorting the publications on the number of citations, and returning the top-most publications. The example shows that mappings need to be executable on a set of input objects and return a set of output objects.

Publications

PDF

Google Scholar
Thor, A.; Rahm, E.
CloudFuice: A flexible Cloud-based Data Integration System
Proc. of 10th Intl. Conference on Web Engineering (ICWE), 2011
2011-06
PDF

Google Scholar
Rahm, E.; Thor, A.; Aumueller, D.
Dynamic Fusion of Web Data
Proc. 5th Intl. XML Database Symposium (XSym), 2007
2007-09
PDF

Google Scholar
publication iconThor, Andreas; Aumueller, David; Rahm, Erhard
Data Integration Support for Mashups
Proc. 6th Intl. Workshop on Information Integration on the Web (IIWeb), 2007
2007-07
PDF

Google Scholar
Kirsten, T.; Thor, A.; Rahm, E.
Instance-based matching of large life science ontologies
Proc. of 4th Intl. Workshop on Data Integration in the Life Sciences (DILS), 2007
2007-06
PDF

Google Scholar
Thor, A.; Kirsten, T.; Rahm, E.
Instance-based matching of hierarchical ontologies
Proc. of 12. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2007
2007-03
PDF

Google Scholar
Köpcke, H.; Rahm, E.
Analyse von Zitierungshäufigkeiten für die Datenbankkonferenz BTW
Datenbank-Spektrum, 7. Jahrgang, Heft 20
2007-02
PDF

Google Scholar
Thor, A.; Rahm, E.
MOMA - A Mapping-based Object Matching System
Proc. 3rd Conference on Innovative Data Systems Research (CIDR), 2007
2007-01
PDF
further information
Google Scholar
Kirsten, Toralf; Rahm, Erhard
BioFuice: Mapping-based data integration in bioinformatics
Proc. of 3rd Int. Workshop on Data Integration in the Life Sciences (DILS), Springer LNCS 4075, 2006
2006-07
PDF
further information
Google Scholar
Rahm, E.; Thor, A.
Citation analysis of database publications
ACM Sigmod Record 24(4), 2005
2005-12
PDF

Google Scholar
Rahm, E.; Thor, A.; Aumueller, D.; Do, H.H.; Golovin, N.; Kirsten, T.
iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings
Proc. 8th Intl. Workshop on the Web and Databases (WebDB), 2005
2005-06
PDF

Google Scholar
publication iconKirsten, T.; Rahm, E.
BioFuice: A decentralized Approach to integrate molecular-biological Data
Proc 4th Research Festival for Life Sciences, Leipzig, Dec. 2005
2005