Browse > Article

Semantic Similarity Search using the Signature Tree  

Kim, Ki-Sung (서울대학교 컴퓨터공학부)
Im, Dong-Hyuk (서울대학교 컴퓨터공학부)
Kim, Cheol-Han (서울대학교 컴퓨터공학부)
Kim, Hyoung-Joo (서울대학교 컴퓨터공학부)
Abstract
As ontologies are used widely, interest for semantic similarity search is also increasing. In this paper, we suggest a query evaluation scheme for k-nearest neighbor query, which retrieves k most similar objects to the query object. We use the best match method to calculate the semantic similarity between objects and use the signature tree to index annotation information of objects in database. The signature tree is usually used for the set similarity search. When we use the signature tree in similarity search, we are required to predict the upper-bound of similarity for a node; the highest similarity value which can be found when we traverse into the node. So we suggest a prediction function for the best match similarity function and prove the correctness of the prediction. And we modify the original signature tree structure for same signatures not to be stored redundantly. This improved structure of signature tree not only reduces the size of signature tree but also increases the efficiency of query evaluation. We use the Gene Ontology(GO) for our experiments, which provides large ontologies and large amount of annotation data. Using GO, we show that proposed method improves query efficiency and present several experimental results varying the page size and using several node-splitting methods.
Keywords
Semantic similarity search; Signature tree;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Resnik, P., Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130
2 Lee, J.H., M.H. Kim, and Y.J. Lee, Information Retrieval based on Conceptual Distance in is-a Hierarchies. Journal of Documentation, 1989. 49(2): p. 188-207   DOI   ScienceOn
3 Mamoulis, N., D.W. Cheung, and W. Lian. Similarity Search in Sets and Categorical Data Using the Signature Tree. in 19th International Conf. on Data Engineering. 2003
4 Ashburner, M., et al., Gene Ontology: tool for the unification of biology. Nat Genet, 2000. 25(1): p. 25-29   DOI   ScienceOn
5 G, R.H. sli, and S. Hanan, Distance browsing in spatial databases. ACM Trans. Database System., 1999. 24(2): p. 265-318   DOI   ScienceOn
6 Tousidou, E., A. Nanopoulos, and Y. Manolopoulos, Improved Methods for Signature- Tree Construction. The Computer Journal, 2000. 43(4): p. 301-314   DOI   ScienceOn
7 Uwe, D., S-tree: a dynamic balanced signature index for office retrieval, in Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval. 1986, Pisa, Italy
8 Jiang, J.J. and D.W. Conrath, Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, in International Conference Research on Computational Linguistics. 1997: Taiwan
9 Varelas, G., E. Voutsakis, and P. Raftopoulou. Semantic Similarity Methods in Wordnet and their Application to Infomation Retrieval on the Web. in WIDM. 2005. Bremen, Germany
10 Rada, R., et al., Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics, 1989. 19(1): p. 17-30   DOI   ScienceOn
11 Charu, C.A., L.W. Joel, and S.Y. Philip, A new method for similarity indexing of market basket data, in Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 1999, ACM Press: Philadelphia, Pennsylvania, United States
12 Lin, D. An Information-theoretic Definition of Similarity. in 15th International Conf. on Machine Learning. 1998. San Francisco, CA