NFA 표현을 사용한 문서-중심적 XML의 키워드 기반 필터링 기법

A Keyword-based Filtering Technique of Document-centric XML using NFA Representation

  • 이경한 (삼성전자 무선사업부) ;
  • 박석 (서강대학교 컴퓨터학과)
  • 발행 : 2006.10.15

초록

XPath 명세는 XML 원소 내용을 필터링하기 위한 질의어 작성이 어렵다. 본 논문은 이러한 문제점을 해결하기 위해 SQL의 LIKE 연산자에서 사용되던 특별한 매칭 문자 '%'를 허용한 확장된 XPath 명세와 그것을 표준 질의어로 사용하는 문서-중심적 XML 필터링 기법인 Pfilter를 제안한다. Pfilter는 값-기반 술어(value-based predicate)에서 피연산자의 공통 앞부분 문자를 공유하여 값-기반 술어의 처리 성능을 향상시킨다. 또한 본 논문은 Pfilter와 대표적인 데이타-중심적 XML 필터링 기법인 Yfilter를 값-기반 술어 처리의 확장성과 효율성에 대해 비교하고 Pfilter의 값-기반 술어 삽입, 삭제, 처리 결과를 제공한다. 본 논문에서 제안한 Pfilter는 XML 필터링 시스템에서 XPath의 contains() 함수를 평가(evaluation)하기 위한 핵심 알고리즘으로 사용할 수 있으며, XML 기반의 분산 정보 시스템을 구축하기 위한 기초 연구로 활용될 수 있다.

In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to filter element contents well, using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. Owing to sharing the common prefix characters of the operands in value-based predicates, the Pfilter improves the performance in processing those. We show several performance studies, comparing Pfilter with Yfilter in respect to efficiency and scalability as using multi-query processing time (MQPT), and reporting the results with respect to inserting, deleting, and processing of value-based predicates. In conclusion, our approach provides a core algorithm for evaluating the contains() function of XPath queries in previous XML filtering researches, and a foundation for building XML-based distributed information systems.

키워드

참고문헌

  1. T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 Second Edition W3C Recommendation. Technical Report REC-xml-200010006, World Wide Web Consortium
  2. J. Kamps, M. Marx, M. de Rijke, B. Sigurbjorrisson. Best-match Query form Document-centric XML. In Proceedings of the International Workshop on the Web and Databases, Pages 55-60, 2004
  3. J. Clark, and S. DeRose. XML Path Language (XPath) Version 1.0 W3C Recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium
  4. S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language W3C Working Draft. Technical Report WD-xquery-20050404, World Wide Web Consortium
  5. A. V. Aho and M. J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM, Volume 18, Issue 6, Pages 333-340, 1975 https://doi.org/10.1145/360825.360855
  6. Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. The ACM Transactions on Database Systems, Volume 28, Issue 4, Pages 467-516, 2003 https://doi.org/10.1145/958942.958947
  7. T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML Streams with Deterministic Automata and Stream Indexes. The ACM Transactions on Databases Systems, Volume 29, Issue 4, Pages 752-788, 2004 https://doi.org/10.1145/1042046.1042051
  8. N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. Navigation- vs. Index-based XML Multi-query Processing. In Proceedings of the IEEE International Conference on Data Engineering, Pages 139-150, 2003
  9. C. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient Filtering of XML Documents with XPath Expressions. In Proceedings of the IEEE International Conference on Data Engineering, Pages 235, 2002
  10. V. Josifovski, M. Fontoura, and A. Barta. Querying XML Streams. The International Journal on Very Large Data Bases, Volume 14, Issue 2, Pages 197-210, 2005 https://doi.org/10.1007/s00778-004-0123-7
  11. J. Kwon, P. Rao, B. Moon, and S. Lee. FiST: Scalable XML Document Filtering by Sequencing Twig Patterns. In Proceedings of the International Conference on Very Large Data Bases, Pages 294-315, 2005
  12. A. K. Gupta and D. Suciu. Stream Processing of XPath Queries with Predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Pages 419-430, 2003 https://doi.org/10.1145/872757.872809
  13. F. Tian, B. Reinwald, H. Pirahesh, T. Mayr, and J. Myllymaki. Implementing A Scalable XML Publish/Subscribe System Using Relational Database Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Pages 479-490, 2004 https://doi.org/10.1145/1007568.1007623
  14. F. Peng, and S. S. Chawathe. XSQ: A Streaming XPath Engine. The ACM Transactions on Databases Systems, Volume 30, Issue 2, Pages 577-623, 2005 https://doi.org/10.1145/1071610.1071617
  15. D. Megginson. SAX: A Free API for Event-based XML Parsing. Available at http://www.saxproject.org, 2005
  16. A. R. Schmidt, F. Waas, M. L. Kersten, I. Manolescu, M. J. Carey, and R. Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the International Conference on Very Large Data Bases, Pages 974-985, 2002
  17. D. Barbosa, A. Mendelzon, J. Keenleyside and K. Lyons. ToXgene: a template-based data generator for XML. In Proceedings of the International Workshop on the Web and Databases, Pages 49-54, 2002 https://doi.org/10.1145/564691.564769
  18. A. L. Diaz and D. Lovell. XML Generator. Avaliable at http://www.alphaworks.ibm.com/tech/xmlgenerator, 2005
  19. C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999