German English

Integration of molecular-biological Data

Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We developed the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to uniformly represent different kinds of annotations originating from different data sources. Existing associations between objects, which represent valuable biological knowledge, are explicitly utilized to drive data integration and combine annotation knowledge from different sources. To serve specific analysis needs, powerful operators are provided to derive tailored annotation views from the generic data representation. GenMapper is operational and has been successfully used for large-scale functional profiling of genes.
The current version of GenMapper is available here.

We also developed a hybrid approach to integrate annotation data for the expression analysis of genes and proteins. Expression data is materialized in a data warehouse while annotation data is integrated virtually according to analysis needs. To facilitate the access to many sources we utilize the commercial product SRS (Sequence Retrieval System) of LION bioscience.

BioFuice is a novel approach for integrating data from different private and public data sources and ontologies. BioFuice follows a peer-to-peer-like data integration based on bidirectional mappings. Sources and mappings are associated with a domain model to support a semantically meaningful interoperability. BioFuice extends the generic iFuice integration platform which utilizes specific operators for data fusion and workflow-like script programs. BioFuice supports explorative data analysis and query and search capabilities. We have applied BioFuice in different research projects, such as for integrating protein interactions, detection of non-coding RNA and gene annotations based on expression experiments.

Current Project Members:

Previous Project Members:

Further and Related Information:

Master Thesis:

Selected Publications:

PDF

Google Scholar
publication iconRahm, Erhard
Discovering product counterfeits in online shops: a big data integration challenge
ACM Journal Data and Information Quality (accepted for publication)
2014-08
PDF

Google Scholar
Kirsten, T.; Thor, A.; Rahm, E.
Instance-based matching of large life science ontologies
Proc. of 4th Intl. Workshop on Data Integration in the Life Sciences (DILS), 2007
2007-06
PDF

Google Scholar
Rahm, Erhard; Kirsten, Toralf; Lange, Jörg
The GeWare data warehouse platform for the analysis of molecular-biological and clinical data
Journal of Integrative Bioinformatics, 4(1):47, 2007
2007-01-20
PDF
further information
Google Scholar
Kirsten, Toralf; Rahm, Erhard
BioFuice: Mapping-based data integration in bioinformatics
Proc. of 3rd Int. Workshop on Data Integration in the Life Sciences (DILS), Springer LNCS 4075, 2006
2006-07
PDF
further information
Google Scholar
Kirsten, T.; Körner, C.; Do, H.H.; Rahm, E.
Hybrid Integration of Molecular-biological Annotation Data
Proc. 2nd International Workshop on Data Integration in the Life Sciences (DILS), Springer LNCS 3615, 2005
2005-07
PDF

Google Scholar
Rahm, E.; Thor, A.; Aumueller, D.; Do, H.H.; Golovin, N.; Kirsten, T.
iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings
Proc. 8th Intl. Workshop on the Web and Databases (WebDB), 2005
2005-06
PDF
further information
Google Scholar
Körner, C.; Kirsten, T.; Do, H.H., Rahm, E.
Hybride Integration von molekularbiologischen Annotationsdaten
11th Conf. of Database Systems for Business, Technology and Web (BTW)
2005-03
PDF
further information
Google Scholar
Do, H.H.; Rahm, E.
Flexible Integration of Molecular-biological Annotation Data: The GenMapper Approach
Proc. EDBT 2004, Heraklion, Greece, Springer LNCS, March 2004.
2004-03

Posters and Talks:

PDF

Google Scholar
publication iconKirsten, T.; Rahm, E.
BioFuice: A decentralized Approach to integrate molecular-biological Data
Proc 4th Research Festival for Life Sciences, Leipzig, Dec. 2005
2005

PDF
Google Scholar
publication iconMützel, B.; Do, H.H.; Khaitovich, P.; Weiß, G.; Rahm, E.; Pääbo, S.
Functional Profiling of Genes Differently Expressed in the Brains of Humans and Chimpanzees
Proc. 2nd Biotech Day, Univ. Leipzig, May 2003, 264-265.
2003-05

PDF
Google Scholar
publication iconRahm, E.; Do, H.H.; Kirsten, T.
Data Integration for Analyzing Gene Expression Data - Selected Projects
Data Integration for Analyzing Gene Expression Data. Bonn, May 2003.
2003-05

PDF
Google Scholar
publication iconKirsten, T.; Do, H.H.; Sosna, D.; Rahm, E.; Krohn, K.; Eszlinger, M.; Paschke, R.
Gene Expression Warehousing in Leipzig
Poster for the Workshop on Databases and Data Integration in Genome Research, Berlin, February 2002
2002-02

PDF
Google Scholar
publication iconKirsten, T.; Do, H.H.; Rahm, E.; Krohn, K.
Gene Expression Warehousing in Leipzig.
Poster/Abstract, Proc. Biotech Day, Univ. Leipzig, May 2002, pp. 152-153.
2002