Exploiting Semantics from Ontologies and Shared Annotations to Partition Linked Data
Proc. 10th Intl. Conference on Data Integration in the Life Sciences (DILS), Lisbon, July 2014
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. The challenge is to explore these rich and complex annotated datasets, together with the domain semantics captured within ontologies, to discover patterns of annotations across multiple concepts that may lead to potential discoveries. We identify annotation signatures of links that associate semantically similar concepts, where similarity is measured in terms of shared annotations and ontological relatedness. Formally, an annotation signature is a partitioning or clustering of the links that represent the relationships between shared annotations. A clustering algorithm named AnnSigClustering is proposed to generate annotation signatures. Evaluation results over drug, disease, and gene datasets demonstrate the effectiveness of using annotation signatures to find patterns between the links that belong to the same part of the signature.