Browse > Article

Conjunctive Boolean Query Optimization based on Join Sequence Separability in Information Retrieval Systems  

박병권 (동아대학교 경영정보과학부)
한욱신 (경북대학교 컴퓨터공학)
황규영 (한국과학기술원 전자전산학과)
Abstract
A conjunctive Boolean text query refers to a query that searches for tort documents containing all of the specified keywords, and is the most frequently used query form in information retrieval systems. Typically, the query specifies a long list of keywords for better precision, and in this case, the order of keyword processing has a significant impact on the query speed. Currently known approaches to this ordering are based on heuristics and, therefore, cannot guarantee an optimal ordering. We can use a systematic approach by leveraging a database query processing algorithm like the dynamic programming, but it is not suitable for a text query with a typically long list of keywords because of the algorithm's exponential run-time (Ο(n2$^{n-1}$)) for n keywords. Considering these problems, we propose a new approach based on a property called the join sequence separability. This property states that the optimal join sequence is separable into two subsequences of different join methods under a certain condition on the joined relations, and this property enables us to find a globally optimal join sequence in Ο(n2$^{n-1}$). In this paper we describe the property formally, present an optimization algorithm based on the property, prove that the algorithm finds an optimal join sequence, and validate our approach through simulation using an analytic cost model. Comparison with the heuristic text query optimization approaches shows a maximum of 100 times faster query processing, and comparison with the dynamic programming approach shows exponentially faster query optimization (e.g., 600 times for a 10-keyword query).
Keywords
Information Retrieval(IR); Conjunctive Boolean Query; Query Optimization; Join Sequence Separability;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Jones, K.-S. and Willett, P., Readings in Information Retrieval, Morgan Kaufmann Publishers, 1997
2 Witten I.-H., Moffat, A., and Bell, C.-B., Managing Gigabytes -- Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York, 1994
3 NIST, TREC-2 ad hoc & TREC-3 routing topics 101 - 150, http://trec.nist.gov/
4 Faloutsos, C., 'Access Methods for Text,' ACM Computing Survey, Vol. 17, No.1, pp. 49-74, 1985   DOI   ScienceOn
5 Chaudhuri, S. and Shim, K., 'Optimization of Queries with User-Defined Predicates,' ACM Trans. on Database Systems, Vol. 24, No.2, pp. 117-228, June 1999   DOI   ScienceOn
6 Whang, K.-Y. and Wiederhold, G., and Sagalowics, D., 'Separability -- An Approach to Physical Database Design,' IEEE Trans. on Computers, Vol.C-33, No.3, pp. 209-222, Mar. 1984   DOI   ScienceOn
7 Frakes, W. -B. and Baeza-Yates, R, Information Retrieval -- Data Structures & Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1992
8 Salton, G., Automatic Text Processing -- The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley. 1988
9 Zhang, C., Naughton, J., DeWitt, D., Luo, Q., and Lohman, G., 'On Supporting Containment Queries in Relational Database Management Systems,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, May 2001   DOI
10 Whang, K.-Y. and Krishnamurthy, R, 'Query Optimization in a Memory-Resident Domain Relational Calculus Database System,' ACM Trans. on Database Systems, Vol.15, No.1, pp. 67-95. Mar. 1990   DOI
11 Monma, C.-L. and Sidney, L., 'Sequencing with Series-Parallel Precedence Constraints,' Mathematics of Operations Research, Vol. 4, No.3, pp. 215-224, Aug. 1979   DOI   ScienceOn
12 Ibaraki, Toshihide and Kameda, Tiko, 'On the Optimal Nesting Order for Computing N-Relational Joins,' ACM Trans. on Database Systems, Vol. 9, No.3, pp. 482-502, Sept. 1984   DOI   ScienceOn
13 Hellerstein, J.-M. and Stonebrake, M., 'Predicate Migration: Optimizing queries with expensive predicates,' In Proc. Int'l Cornf. on Management of Data, ACM SIGMOD, pp. 267-276, Washington D.C., May 1993
14 Hellerstein, J.-M., 'Practical Predicate Placement,' In Proc. Int'l Conf. on Management c! Data, ACM SIGMOD, pp. 325-335, Minneapolis, MN, May 1994   DOI
15 Elmasri, R. and Navathe, S.-B., Fundamentals of Database Systems, 3rd Edition, The Benjamin/ Cummings, Redwood City, California, 2000
16 Selinger, P.-G., Astrahan, M.-M., Chamberlin, D.-D., Lorie, R.-A., and Price, T.-G., 'Access Path Selection in a Relational Database Management System,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 23-34, Boston, May 1979   DOI
17 Whang, K.-Y., Park, B.-K., Han. W.-S., and Lee, Y.-K., 'An Inverted Index Storage Structure Using Subindexes and Large Objectsfor Tight Coupling of Information Retrieval with Database Management Systems,' United States Patent No. 6349308, Feb. 2002
18 Krishnamurthy, R, Boral, H., and Zaniolo, C., 'Optimization of Nonrecursive Queries,' In Proc. Int'l Conf. on Very Large Data Bases. pp. 128-137, Kyoto, Aug. 1986