[KSCI] Korea Science Citation Index Service

PIX: Partitioned Index for Keyword Search over XML Documents

Lee Hongrae (서울대학교 컴퓨터공학부)
Lee Hyungdong (서울대학교 컴퓨터공학부)
Yoo Sangwon (서울대학교 컴퓨터공학부)
Kim Hyoung-Joo (서울대학교 컴퓨터공학부)

Publication Information

Journal of KIISE:Databases / v.31, no.6, 2004 , pp. 710-720 More about this Journal

Abstract

As XML documents have much richer information than plain texts, we can perform very elaborated, fine-grained search which was difficult in past years. However, as the cost of finer grained element level search is very high, the processing overhead has become a new challenge. We propose an inverted index structure called PIX, which reduces the number of elements processed by partitioning elements according to their match potentiality. We choose a base level and partition elements according to whether they have possibility of having a common ancestor higher than the level. We also propose partition merging technique by which we can get same results as unpartitioned case. Our experimental results show that the index partitioning strategy can reduce processing time considerably.

Keywords

XML; keyword search; partitioned index; inverted index;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	A. Moffat, J. Zobel. Self-Indexing Inverted Files for Fast Text Retrieval. TODS Vol. 14, No. 4, pp. 349-379, 1996 DOI ScienceOn
2	Jongik Kim, Ilhwan Choi, Hyun-Sook Lee and Hyoung-Joo Kim, 'XDBox: Impelementation of XML object repository,' in Proc. of KISS Spring Conference, April 2003 과학기술학회마을
3	http://www.sleepycat.com
4	Initiative for the evaluation of XML retrieval
5	http://www.ibiblio.org/xml/examples/shakespeare/
6	D. Florescu, et al. Integrating Keyword Search into XML Query Processing.
7	S. Brin, L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWWW7 '98
8	G. Salton and M. J. McGrill, 'Introduction to Modem Information Retrieval,' McGraw-Hill, New York, 1983
9	D. Cutting, J. Pedersen. Optimizations for Dynamic Inverted Index Maintenance. In Proc. of SIGIR, pp. 405-511, 1990 DOI
10	V. N. Anh, O. Krester, A. Moffat. Vector-Space Ranking with Effective Early Termination. In Proc. of SIGIR, pp. 35-42, 2001 DOI
11	A. Theobald, G. Weikum. Adding Relevance to XML. WebDB 2000
12	D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer, Searching XML documents via XML fragments, Proc. of the 26th Int. ACM SIGIR Conf., 2003 DOI
13	L. Mignet, D. Barbosa, P. Veltri. The XML Web: a First Study. WWW 2003
14	N. Fuhr, K. Grobjohann. XIRQL: A Query Language for Information Retrieval in XML Documents. In Proc. of SIGIR, pp. 172-180, 2001
15	S. Putz. Using a Relational Database for an Inverted Text Index. XEROX Technical Report '91
16	Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J., 'XRANK: Ranked Keyword Search over XML Documents,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 16-27, 2003 DOI
17	S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A Semantic Search Engine for XML. In Proc. of VLDB, pp.45-56, 2003
18	A. Theobald, G. Weikum. The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking. In Proc. of EDBT, pp. 477-495, 2002
19	V. Hritidis, Y. Papakonstantinou, A. Balmin. Keyword Proximity Search on XML Graph. In Proc. of ICDE, pp. 367-377, 2003
20	XPath: XML Path language, Nov. 1999. http://www.w3.org/TR/xpath
21	XQuery: A query language for XML, Feb. 2001. http://www.w3.org/TR/qeury
22	J. Robie, et al. XML query language(XQL). The Query Languages Workshop. W3c, Dec. 1998, http://www.w3.org/TrandS/QL/QL98/pp/xql.html
23	J.P. Callan. Passage-Level Evidence in Document Retrieval. In Proc. of SIGIR, pp. 302-310, 1994
24	R. Wilkinson. Effective retrieval of structured documents. In Proc. of SIGIR, pp.311-317, 1994
25	J. Zobel, A. Moffat, et. al. Effienent retireval of partial documents. Information Processing and Management, Vol. 31, No. 3, pp. 361-377, 1995 DOI ScienceOn
26	A. Deutsch, M. Fernandez, et al. XML-QL: A query language for XML. The Query Languages Workshop. W3c, Dec. 1998
27	http://www.w3.org/XML

KSCI

PIX: Partitioned Index for Keyword Search over XML Documents PIX: XML문서 검색을 위한 색인 분할 기법

PIX: Partitioned Index for Keyword Search over XML Documents