Do, H. ; Rahm, E. ; Krohn, K. ; Paschke, R.

DBMS-based EST Clustering and Profiling for Gene Expression Analysis

Proc. First Workshop Computational Biology in Saxony: Problems and Perspectives. Dresden, November 2001

2001 / 11


Futher information:


Recently, several computer-based (in silico differential display) methods have been developed to exploit the huge and steadily growing amounts of EST sequences, which are available in various public databases (e.g., dbEST), for a number of analysis purposes, such as discovery of new genes, comparative or functional analysis of known genes etc. However, the implementation of these methods usually makes use of a simple, file-based data management, which leads to several essential limitations, e.g., focus on small data sets, one or a few tissues or organs, use of fixed and uncomfortable query interfaces as well as manual evaluation of query results. We propose a novel and comprehensive database solution, which is capable of overcoming the limitations of current approaches. At the heart of the solution, a DBMS-based data store has to be constructed, which centrally manages all kinds of data, in particular sequence and annotation data. The data should be extracted from external sequence databases, transformed and cleaned as required, and updated on a regular basis to reflect the changes of the corresponding data sources. EST clusters, alignments and profiles represent the basis for all analysis tasks; therefore powerful tools are to be developed to perform the process of EST clustering and profiling in advance. Based on the results of this pre-computation, it should be possible to carry out the discovery of new genes and other related analysis tasks in a (widely) automatable way.