Browse > Article

A Fast Algorithm for the k-Keyword Ordered Proximity Problem  

Kim, Jin-Wook (인하대학교 컴퓨터정보공학부)
Abstract
In the web search engines, the proximity is used to compute the relevance of a document to the given query. There exist various research results about the proximity problems and the ordered proximity problems. In this paper, we present O(n) time algorithms for the k-keyword ordered proximity problems where n is the total number of occurrences of the k keywords in a document. Experimental results show that the proposed algorithms are about 1.2 times and over 3 times faster than the previous results when k=2 and k=5, respectively.
Keywords
proximity; order preserving; string algorithms;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Sadakane, H. Imai, Fast algorithms for -word proximity search, IEICE Trans. Fundamentals, E84-A(9), pp.312-319, 2001.
2 S.-R. Kim, I. Lee, K. Park, A Fast Algorithm for the Generalized -keyword Proximity Problem Given Keyword Offsets, Information Processing Letters, 91(3), pp.115-120, 2004.   DOI   ScienceOn
3 I. Lee, S.-R. Kim, An Algorithm for the Generalized -Keyword Proximity Problem and Finding Longest Repetitive Substring in a Set of Strings, Proc. of the 6th International Conference on Computational Science, LNCS, 3994, pp.289-292, 2006.
4 C. Gupta, Efficient k-Word Proximity Search, MS Thesis, CWRU, EECS Department, 2008.
5 C. Gupta, G. Ozsoyoglu, Z.M. Ozsoyoglu. Efficient k-Word Proximity Search. Proc. of the 24th International Symposium on Computer and Information Sciences, pp.123-128, 2009.
6 R. Baeza-Yates, W. Cunto, The ADT proximity and text proximity problems, Proc. IEEE String Processing and Information Retrieval Symposium, pp.24-30, 1999.
7 U. Manber, R. Baeza-Yates, An algorithm for string matching with a sequence of don't cares, Information Processing Letters, 37, pp.133-136, 1991.   DOI   ScienceOn
8 S. Brin, L. page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 30(1-7), pp.107-117, 1998.   DOI   ScienceOn
9 Daum, http://www.daum.net.
10 Yahoo, http://www.yahoo.com.
11 J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pp.668-677, 1998.
12 G.H. Gonnet, R. Baeza-Yates, T. Snider, New indices for text: PAT trees and PAT arrays, in Information Retrieval: Algorithms and Data Structures, ed. W. Frakes and R. Baeza-Yates, pp. 66-82. Prentice-Hall, 1992.
13 Naver, http://www.naver.com.
14 Google, http://www.google.com.