Browse > Article

Accelerating Keyword Search Processing over XML Documents using Document-level Ranking  

Lee, Hyung-Dong (서울대학교 컴퓨터공학과)
Kim, Hyoung-Joo (서울대학교 컴퓨터공학과)
Abstract
XML Keyword search enables us to get information easily without knowledge of structure of documents and returns specific and useful partial document results instead of whole documents. Element level query processing makes it possible, but computational complexity, as the number of documents grows, increases significantly overhead costs. In this paper, we present document-level ranking scheme over XML documents which predicts results of element-level processing to reduce processing cost. To do this, we propose the notion of 'keyword proximity' - the correlation of keywords in a document that affects the results of element-level query processing using path information of occurrence nodes and their resemblances - for document ranking process. In benefit of document-centric view, it is possible to reduce processing time using ranked document list or filtering of low scored documents. Our experimental evaluation shows that document-level processing technique using ranked document list is effective and improves performance by the early termination for top-k query.
Keywords
XML; keyword search; proximity; document-centric view; filtering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Michael Persin, Justin Zobel, Ron Sacks-Davis, 'Filtered Document Retrieval with FrequencySorted Indexes,' JASIS 47(10), 1996
2 Ross Wilkinson, 'Effective Retrieval of Structured Documents,' SIGIR, 1994
3 http://www.w3.org/XML/
4 Resnick, P. and Varian, H.R., 'Recommender systems,' CACM 40(3), 1997   DOI
5 Donna Harman, Gerald Candela, 'Retrieving Records from a Gigabyte of Text on a Mini-Computer Using Statistical Ranking,' JASIS 41(8), 1990
6 Gerard Salton, James Allan, Chris Buckley, 'Approaches to Passage Retrieval in Full Text Information Systems,' SIGIR, 1993   DOI
7 D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, A. Soffer. 'Searching XML Documents via XML Fragments,' SIGIR, 2003   DOI
8 R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval, ACM Press /AddisonWesley, 1999
9 Shurug Al-Khalifa, Cong Yu, and H. V. Iagadish. 'Querying structured text in an XML database,' SIGMOD, 2003
10 S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. 'XSEarch: A Semantic Search Engine for XML,' VLDB, 2003
11 ZhiyuanChen, Flip Korn, Nick Koudas, S. Muthukrishnan 'Selectivity Estimation for Boolean Queries,' PODS 2000   DOI
12 V. Hritidis, Y. Papakonstantinou, A. Balmin. 'Keyword Proximity Search on XML Graph,' ICDE, 2003
13 S. Brin, L. Page, 'The Anatomy of a Large-Scale Hypertextual Web Search Engine,' WWW7, 1998   DOI   ScienceOn
14 T. Igor, D. V. Stratis, B. Kevin, S. Jayavel, S. Eugene and Z. Chun: 'Storing and querying ordered XML using a relational database system,' ACM SIGMOD, 2002   DOI
15 Initiative for the evaluation of XML retrieval
16 Mukund Deshpande and George Karypis, 'Itembased top-N recommendation algorithms,' ACM Trans. Inf. Syst. 22(1), 2004   DOI   ScienceOn
17 D Florescu, et al., 'Integrating Keyword Search into XML Query Processing,' WWW, 1999
18 L. Guo, et al. 'XRANK: Ranked Keyword Search over XML Documents,' SIGMOD, 2003   DOI
19 http://www.sleepycat.com
20 L. Mignet, D. Barbosa, P. Veltri. 'The XML Web: a First Study,' WWW 2003   DOI
21 Igor Tatarinov, Stratis Viglas, Kevin S. Beyer, Jayavel Shanmugasundaram, Eugene J. Shekita, Chun Zhang, 'Storing and querying ordered XML using a relational database system,' SIGMOD, 2002   DOI
22 Andrei Z. Broder, Moses Charikar, Alan M. Frieze, Michael Mitzenmacher,' Min-Wise Independent Permutations,' STOC 1998   DOI
23 Edith Cohen, 'Size-Estimation Framework with Applications to Transitive Closure and Reachability,' J. Comput. Syst, Sci. 55(3) 1997   DOI   ScienceOn
24 Norbert Fuhr, Kai GroBjohann, 'XIRQL: A Query Language for Information Retrieval in XML Documents,' SIGIR, 2001   DOI
25 Gerard Salton, 'Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer,' Addison-Wesley, 1989
26 Ahn Ngoc Yo, Owen de Kretser, Alistair Moffat, 'Vector-Space Ranking with Effective Early Termination,' SIGIR, 2001   DOI