Browse > Article

PIX: Partitioned Index for Keyword Search over XML Documents  

Lee Hongrae (서울대학교 컴퓨터공학부)
Lee Hyungdong (서울대학교 컴퓨터공학부)
Yoo Sangwon (서울대학교 컴퓨터공학부)
Kim Hyoung-Joo (서울대학교 컴퓨터공학부)
Abstract
As XML documents have much richer information than plain texts, we can perform very elaborated, fine-grained search which was difficult in past years. However, as the cost of finer grained element level search is very high, the processing overhead has become a new challenge. We propose an inverted index structure called PIX, which reduces the number of elements processed by partitioning elements according to their match potentiality. We choose a base level and partition elements according to whether they have possibility of having a common ancestor higher than the level. We also propose partition merging technique by which we can get same results as unpartitioned case. Our experimental results show that the index partitioning strategy can reduce processing time considerably.
Keywords
XML; keyword search; partitioned index; inverted index;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. Moffat, J. Zobel. Self-Indexing Inverted Files for Fast Text Retrieval. TODS Vol. 14, No. 4, pp. 349-379, 1996   DOI   ScienceOn
2 Jongik Kim, Ilhwan Choi, Hyun-Sook Lee and Hyoung-Joo Kim, 'XDBox: Impelementation of XML object repository,' in Proc. of KISS Spring Conference, April 2003   과학기술학회마을
3 http://www.sleepycat.com
4 Initiative for the evaluation of XML retrieval
5 http://www.ibiblio.org/xml/examples/shakespeare/
6 D. Florescu, et al. Integrating Keyword Search into XML Query Processing.
7 S. Brin, L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWWW7 '98
8 G. Salton and M. J. McGrill, 'Introduction to Modem Information Retrieval,' McGraw-Hill, New York, 1983
9 D. Cutting, J. Pedersen. Optimizations for Dynamic Inverted Index Maintenance. In Proc. of SIGIR, pp. 405-511, 1990   DOI
10 V. N. Anh, O. Krester, A. Moffat. Vector-Space Ranking with Effective Early Termination. In Proc. of SIGIR, pp. 35-42, 2001   DOI
11 A. Theobald, G. Weikum. Adding Relevance to XML. WebDB 2000
12 D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer, Searching XML documents via XML fragments, Proc. of the 26th Int. ACM SIGIR Conf., 2003   DOI
13 L. Mignet, D. Barbosa, P. Veltri. The XML Web: a First Study. WWW 2003
14 N. Fuhr, K. Grobjohann. XIRQL: A Query Language for Information Retrieval in XML Documents. In Proc. of SIGIR, pp. 172-180, 2001
15 S. Putz. Using a Relational Database for an Inverted Text Index. XEROX Technical Report '91
16 Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J., 'XRANK: Ranked Keyword Search over XML Documents,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 16-27, 2003   DOI
17 S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A Semantic Search Engine for XML. In Proc. of VLDB, pp.45-56, 2003
18 A. Theobald, G. Weikum. The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking. In Proc. of EDBT, pp. 477-495, 2002
19 V. Hritidis, Y. Papakonstantinou, A. Balmin. Keyword Proximity Search on XML Graph. In Proc. of ICDE, pp. 367-377, 2003
20 XPath: XML Path language, Nov. 1999. http://www.w3.org/TR/xpath
21 XQuery: A query language for XML, Feb. 2001. http://www.w3.org/TR/qeury
22 J. Robie, et al. XML query language(XQL). The Query Languages Workshop. W3c, Dec. 1998, http://www.w3.org/TrandS/QL/QL98/pp/xql.html
23 J.P. Callan. Passage-Level Evidence in Document Retrieval. In Proc. of SIGIR, pp. 302-310, 1994
24 R. Wilkinson. Effective retrieval of structured documents. In Proc. of SIGIR, pp.311-317, 1994
25 J. Zobel, A. Moffat, et. al. Effienent retireval of partial documents. Information Processing and Management, Vol. 31, No. 3, pp. 361-377, 1995   DOI   ScienceOn
26 A. Deutsch, M. Fernandez, et al. XML-QL: A query language for XML. The Query Languages Workshop. W3c, Dec. 1998
27 http://www.w3.org/XML