Browse > Article

Partitioning and Merging an Index for Efficient XML Keyword Search  

Kim, Sung-Jin (서울대학교 전기컴퓨터공학부)
Lee, Hyung-Dong (서울대학교 전기컴퓨터공학부)
Kim, Hyoung-Joo (서울대학교 전기컴퓨터공학부)
Abstract
In XML keyword search, a search result is defined as a set of the smallest elements (i.e., least common ancestors) containing all query keywords and a granularity of indexing is an XML element instead of a document. Under the conventional index structure, all least common ancestors produced by the combination of the elements, each of which contains a query keyword, are considered as a search result. In this paper, to avoid unnecessary operations of producing the least common ancestors and reduce query process time, we describe a way to construct a partitioned index composed of several partitions and produce a search result by merging those partitions if necessary. When a search result is restricted to be composed of the least common ancestors whose depths are higher than a given minimum depth, under the proposed partitioned index structure, search systems can reduce the query process time by considering only combinations of the elements belonging to the same partition. Even though the minimum depth is not given or unknown, search systems can obtain a search result with the partitioned index, which requires the same query process time to obtain the search result with non-partitioned index. Our experiment was conducted with the XML documents provided by the DBLP site and INEX2003, and the partitioned index could reduce a substantial amount of query processing time when the minimum depth is given.
Keywords
XML(eXtensible Markup Language); XML Keyword Search; Partitioned Index;
Citations & Related Records
연도 인용수 순위
  • Reference
1 WWW Consortium, http://www.w3.org/XML/
2 Salton, G., and McGrill, M.J., 'Introduction to Modern Information Retrieval,' McGraw-Hill, New York, 1983
3 DBLP, http://www.informatik.uni-trier.de/~ley/db/index.html
4 Initiative for the evaluation of XML retrieval, http://inex.is.informatik.uni-duisburg.de:2003/
5 BerkeylyDB, http://www.sleepycat.com
6 Theobald, A., and Weikum, G., 'Adding Relevance to XML,' In Proceedings of the 3th International Workshop on the Web and Databases, pp.105-124, 2000
7 Xu, Y., and Papakonstantinou, Y., 'Efficient Keyword Search for Smallest LCAs in XML Databases,' In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp.527-538, 2005   DOI
8 Mignet, L., Barbosa, D., and Veltri, P., 'The XML Web: a First Study,' In Proceedings of the 12th International World Wide Web Conference, pp.500-510, 2003   DOI
9 Anh, V., Krester, O., and Moffat, A., 'Vector-Space Ranking with Effective Early Termination,' In Proceedings of the 24th Annual International ACM SIGIR Confenfrence on Research and Development in Information Retrieval, pp.35-42, 2001   DOI
10 Florescu, D., Kossmann, D., and Manolescu, L., 'Integrating Keyword Search into XML Query Processing,' Computer Networks, Vol.33, No.1-6, pp.119-135, 2000
11 Moffat, A., and Zobel, J., 'Self-Indexing Inverted Files for Fast Text Retrieval,' ACM Transactions on Database Systems, Vol.14, No.4, pp.349-379, 1996   DOI   ScienceOn
12 Putz, S., Using a Relational Database for an Inverted Text Index. XEROX Technical Report '91
13 Carmel, D., Maarek, Y,S., Mandelbrod, M., Mass, Y., and Soffer, A., 'Searching XML Documents via XML Fragments,' In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 151-158, 2003   DOI
14 Cohen, S., Mamou, J., Kanza, Y., and Sagiv, Y., 'XSEarch: A Semantic Search Engine for XML,' In Proceedings of 29th International Conference on Very Large Data Bases, pp.45-56, 2003
15 Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J., 'XRANK: Ranked Keyword Search over XML Documents,' In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp.16-27, 2003   DOI
16 Hritidis, V., Papakonstantinou, P., and Balmin, A., 'Keyword Proximity Search on XML Graph,' In Proceedings of the 19th International Conference on Data Engineering, pp.367-378, 2003