Browse > Article

GORank: Semantic Similarity Search for Gene Products using Gene Ontology  

Kim, Ki-Sung (서울대학교 전기컴퓨터공학부)
Yoo, Sang-Won (서울대학교 전기컴퓨터공학부)
Kim, Hyoung-Joo (서울대학교 전기컴퓨터공학부)
Abstract
Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.
Keywords
Gene Ontology; Semantic similarity search;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Guntzer, U., W.-T. Balke, and W. Kie${\beta}$ling, Optimizing Multi-Feature Queries for Image Databases, in VLDB. 2000: Egypt
2 Azuaje, F., H. Wang, and O. Bodenreider, Ontology-driven Similarity Approached to Supporting Gene Functional Assessment, in ISMB Sig meeting on Bio-ontology. 2005
3 Cover, T. and J. Thomas, Elements of Information Theory. 1991: Wiley-Interscience
4 Hjaltason, G.R. and H. Samet, Indexing-Driven Similarity Search in Metric Space. ACM Transactions on Database Systems, 2003. 28(4): p. 517-580   DOI   ScienceOn
5 Resnik, P., Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130
6 Bohm, C., S. Berchtold, and D.A. Keim, Searching in High-Dimensional Spaces: Index structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys, 2001. 33(3): p. 322-373   DOI   ScienceOn
7 Chavez, E., et al., Searching in Metric Spaces. ACM Computing Surveys, 2001. 33(3): pp. 273-321   DOI   ScienceOn
8 Lord, P.W., et al. Semantic Similarity Measures As Tools For Exploring the Gene Ontology. in Pacific Symposium on Biocomputing 2003
9 Maguitman, A.G. and F. Menczer. Algorithmic Detection of Semantic Similarity. in WWW. 2005. Chiba, Japan   DOI
10 Jiang, J.J. and D.W. Conrath, Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, in International Conference Research on Computational Linguistics. 1997: Taiwan
11 Lee, J.H., M.H. Kim, and Y.J. Lee, Information Retrieval based on Conceptual Distance in is-a Hierarchies. Journal of Documentation, 1989. 49(2): p. 188-207   DOI   ScienceOn
12 Fagin, R., A. Lotem, and M. Naor, Optimal Aggregation Algorithms for Middleware, Journal of Computer and System Sciences, 2003. 66(4): p. 614-656   DOI   ScienceOn
13 Rada, R., et al., Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics, 1989. 19(1): p. 17-30   DOI   ScienceOn
14 The Gene Ontology Consortium, Creating the Gene Ontology Resource: Design and Implementation. Genome Res, 2001. 11(8): p. 1425-33   DOI   ScienceOn
15 Lin, D. An Information-theoretic Definition of Similarity. in 15th International Conf. on Machine Learning. 1998. San Francisco, CA
16 Aslam, J.A. and M. Frost. An Information-theoretic Measure for Document Similarity. in SIGIR. 2003. Toronto, Canada   DOI