A Fast Algorithm for the k-Keyword Ordered Proximity Problem

순서를 고려하는 k-키워드 근접도 문제를 위한 빠른 알고리즘

  • 김진욱 (인하대학교 컴퓨터정보공학부)
  • Received : 2009.12.02
  • Accepted : 2010.01.01
  • Published : 2010.03.15

Abstract

In the web search engines, the proximity is used to compute the relevance of a document to the given query. There exist various research results about the proximity problems and the ordered proximity problems. In this paper, we present O(n) time algorithms for the k-keyword ordered proximity problems where n is the total number of occurrences of the k keywords in a document. Experimental results show that the proposed algorithms are about 1.2 times and over 3 times faster than the previous results when k=2 and k=5, respectively.

웹 검색 엔진들은 질의에 대한 문서의 적합성을 판단하기 위한 방법의 하나로 근접도를 사용한다. 근접도는 키워드의 순서를 고려하지 않는 방식과 순서를 고려하는 방식이 모두 연구되어왔다. 본 논문에서는 k개 키워드의 순서를 모두 고려하는 근접도 문제를 위한 O(n) 시간 알고리즘을 제시한다. 이때, n은 k개의 키워드가 문서에 나타난 전체 횟수이다. 또한 실험을 통해 이전 연구 결과보다 k=2인 경우는 약 1.2배의 속도 향상을, k=5인 경우는 3배 이상의 속도 향상이 있음을 보인다.

Keywords

References

  1. Google, http://www.google.com.
  2. Naver, http://www.naver.com.
  3. Daum, http://www.daum.net.
  4. Yahoo, http://www.yahoo.com.
  5. S. Brin, L. page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 30(1-7), pp.107-117, 1998. https://doi.org/10.1016/S0169-7552(98)00110-X
  6. J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pp.668-677, 1998.
  7. G.H. Gonnet, R. Baeza-Yates, T. Snider, New indices for text: PAT trees and PAT arrays, in Information Retrieval: Algorithms and Data Structures, ed. W. Frakes and R. Baeza-Yates, pp. 66-82. Prentice-Hall, 1992.
  8. R. Baeza-Yates, W. Cunto, The ADT proximity and text proximity problems, Proc. IEEE String Processing and Information Retrieval Symposium, pp.24-30, 1999.
  9. U. Manber, R. Baeza-Yates, An algorithm for string matching with a sequence of don't cares, Information Processing Letters, 37, pp.133-136, 1991. https://doi.org/10.1016/0020-0190(91)90032-D
  10. K. Sadakane, H. Imai, Fast algorithms for -word proximity search, IEICE Trans. Fundamentals, E84-A(9), pp.312-319, 2001.
  11. S.-R. Kim, I. Lee, K. Park, A Fast Algorithm for the Generalized -keyword Proximity Problem Given Keyword Offsets, Information Processing Letters, 91(3), pp.115-120, 2004. https://doi.org/10.1016/j.ipl.2004.03.017
  12. I. Lee, S.-R. Kim, An Algorithm for the Generalized -Keyword Proximity Problem and Finding Longest Repetitive Substring in a Set of Strings, Proc. of the 6th International Conference on Computational Science, LNCS, 3994, pp.289-292, 2006.
  13. C. Gupta, Efficient k-Word Proximity Search, MS Thesis, CWRU, EECS Department, 2008.
  14. C. Gupta, G. Ozsoyoglu, Z.M. Ozsoyoglu. Efficient k-Word Proximity Search. Proc. of the 24th International Symposium on Computer and Information Sciences, pp.123-128, 2009.