Browse > Article

A Keyword-based Filtering Technique of Document-centric XML using NFA Representation  

Lee, Kyoung-Han (삼성전자 무선사업부)
Park, Seog (서강대학교 컴퓨터학과)
Abstract
In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to filter element contents well, using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. Owing to sharing the common prefix characters of the operands in value-based predicates, the Pfilter improves the performance in processing those. We show several performance studies, comparing Pfilter with Yfilter in respect to efficiency and scalability as using multi-query processing time (MQPT), and reporting the results with respect to inserting, deleting, and processing of value-based predicates. In conclusion, our approach provides a core algorithm for evaluating the contains() function of XPath queries in previous XML filtering researches, and a foundation for building XML-based distributed information systems.
Keywords
XML; filtering; value-based predicate;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. L. Diaz and D. Lovell. XML Generator. Avaliable at http://www.alphaworks.ibm.com/tech/xmlgenerator, 2005
2 D. Megginson. SAX: A Free API for Event-based XML Parsing. Available at http://www.saxproject.org, 2005
3 N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. Navigation- vs. Index-based XML Multi-query Processing. In Proceedings of the IEEE International Conference on Data Engineering, Pages 139-150, 2003
4 T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML Streams with Deterministic Automata and Stream Indexes. The ACM Transactions on Databases Systems, Volume 29, Issue 4, Pages 752-788, 2004   DOI   ScienceOn
5 Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. The ACM Transactions on Database Systems, Volume 28, Issue 4, Pages 467-516, 2003   DOI   ScienceOn
6 J. Kamps, M. Marx, M. de Rijke, B. Sigurbjorrisson. Best-match Query form Document-centric XML. In Proceedings of the International Workshop on the Web and Databases, Pages 55-60, 2004
7 V. Josifovski, M. Fontoura, and A. Barta. Querying XML Streams. The International Journal on Very Large Data Bases, Volume 14, Issue 2, Pages 197-210, 2005   DOI
8 C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999
9 F. Tian, B. Reinwald, H. Pirahesh, T. Mayr, and J. Myllymaki. Implementing A Scalable XML Publish/Subscribe System Using Relational Database Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Pages 479-490, 2004   DOI
10 J. Kwon, P. Rao, B. Moon, and S. Lee. FiST: Scalable XML Document Filtering by Sequencing Twig Patterns. In Proceedings of the International Conference on Very Large Data Bases, Pages 294-315, 2005
11 J. Clark, and S. DeRose. XML Path Language (XPath) Version 1.0 W3C Recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium
12 S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language W3C Working Draft. Technical Report WD-xquery-20050404, World Wide Web Consortium
13 T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 Second Edition W3C Recommendation. Technical Report REC-xml-200010006, World Wide Web Consortium
14 A. V. Aho and M. J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM, Volume 18, Issue 6, Pages 333-340, 1975   DOI   ScienceOn
15 D. Barbosa, A. Mendelzon, J. Keenleyside and K. Lyons. ToXgene: a template-based data generator for XML. In Proceedings of the International Workshop on the Web and Databases, Pages 49-54, 2002   DOI
16 A. R. Schmidt, F. Waas, M. L. Kersten, I. Manolescu, M. J. Carey, and R. Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the International Conference on Very Large Data Bases, Pages 974-985, 2002
17 F. Peng, and S. S. Chawathe. XSQ: A Streaming XPath Engine. The ACM Transactions on Databases Systems, Volume 30, Issue 2, Pages 577-623, 2005   DOI   ScienceOn
18 A. K. Gupta and D. Suciu. Stream Processing of XPath Queries with Predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Pages 419-430, 2003   DOI
19 C. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient Filtering of XML Documents with XPath Expressions. In Proceedings of the IEEE International Conference on Data Engineering, Pages 235, 2002