Evolution-based analysis of functional protein annotation
Poster at 7th Leipzig Research Festival for Life Sciences 2008
Ontologies are heavily used in life sciences to semantically describe functional annotation of molecular-biological objects, such as proteins. Both, ontologies and functional annotation, continuously underlie changes to incorporate new research results, e.g., when new experimental findings need to be added or existing knowledge should be revised. However, frequent changes of ontologies and functional annotation can heavily affect dependent data and software systems resulting in outdated (wrong) annotation and incompatible use of ontologies. We address this problem by a first quantitative analysis studying the evolution of life science ontologies, protein data and their functional annotation. In particular, we evaluated the ontologies Molecular Functions, Biological Processes and Cellular Components as part of the well-known Gene Ontology (GO) which are characterized by a high increase on both, the concept (term) level and their inner ontology relationships. Moreover, we comparatively analyzed evolutionary changes of protein annotation in Ensembl and Swissprot by taking the Evidence Code (EC) taxonomy into account. ECs describe the reliability of protein annotations using the GO sub-ontologies, e.g., to differentiate between automatically generated and manually curated annotations. The results show a significant difference in the evolution of annotations generated by diverse methods. Whereas the curator-assigned annotations constantly grew, the automatically assigned annotations dramatically increased with high fluctuations. This implies that users and applications need to take into account the reliability of annotations.