Browse > Article

XML Document Filtering based on Segments  

Kwon, Joon-Ho (서울대학교 전기컴퓨터공학부)
Rao, Praveen (미주리켄사스대학교)
Moon, Bong-Ki (아리조나대학교 전산학과)
Lee, Suk-Ho (서울대학교 전기컴퓨터공학부)
Abstract
In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribed users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered to only the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a fast and scalable XML filtering system called SFiST which is an extension of the FiST system. Sharable segments are extracted from twig patterns and stored into the hash-based Segment Table in SFiST system. Segments are used to represent user profiles as Terse Sequences and stored in the Compact Segment Index during filtering. Our experimental study shows that SFiST system has better performance than FiST system in terms of filtering time and memory usage.
Keywords
XML filtering; segment; twig pattern; Prufer sequence;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Joonho Kwon, Praveen Rao, Bongki Moon, Sukho Lee, "FiST: Scalable XML Document Filtering by Sequencing Twig Patterns," In Proceeding of the 31st VLDB Conference, pp. 217-228, 2005
2 H. Prüfer, "Neuer Beweis eines Satzes über Permutationen," Archiv fur Mathematik und Physik, 27: 142-144, 1998
3 Bertram Ludäscher, Pratik Mukhopadhyay, Yannis Papakonstantinou, "A transducer-based XML query processor," In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002, pp. 227-238
4 Scott Boag, Don Chamberlin, Mary F. Fernández, Daniela Florescu, Jonathan Robiem, Jérôme Siméon, "XQuery 1.0: An XML Query Language," http:// www.w3.org/TR/xquery/
5 Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," In Proceeding of the 2003 ACM-SIGMOD conference, pp. 419-430, San Diego, CA, June 2003
6 Nicolas Bruno, Luis Gravano, Nick Koudas, Divesh Srivastava, "Navigation- vs. Index-Based XML Multi-Query Processing," In Proceedings of the 19th IEEE International Conference on Data Engineering, pp. 139-150, Bangalore, India, March 2003
7 K. Selçuk Candan, Wang-Pin Hsiung, Songting Chen, Jun'ichi Tatemura and Divyakant Agrawal, "AFilter: adaptable XML filtering with prefix- caching suffix-clustering," In Proceedings of the 32nd VLDB Conference, Seoul, Korea, 2006, pp. 559-570
8 David Megginson, Simple API for XML, http:// sax.sourceforge.net/
9 The Penn Treebank Project, http://www.cis.upenn. edu/~ treebank/
10 권준호, Praveen Rao, 문봉기, 이석호, "가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링", 정보과학회논문지:데이타베이스, 제33권, 제4호, pp. 423-436, 2006   과학기술학회마을
11 James Clark, "XSL Transformations (XSLT) Version 1.0," http://www.w3.org/TR/xslt/ (Nov. 1999)
12 Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, Peter Fischer, "Path sharing and predicate evaluation for high-performance XML filtering," ACM Trans. Database Syst, 28(4) : 467- 516, 2003   DOI   ScienceOn
13 Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, Jussi Myllymaki, "Implementing a Scalable XML Publish/Subscribe System Using a Relational Database System," In Proceeding of the 2004 ACM-SIGMOD Conference, pp. 479-490, Paris, France, June 2004
14 James Clark, Steve DeRose, "XML Path Language (XPath) version 1.0," http://www.w3.org/ TR/ xpath/ (Nov. 1999)
15 Bingsheng He, Qiong Luo, Byron Choi, "Cache- Conscious Automata for XML Filtering," In Proceedings of the 21st IEEE International Conference on Data Engineering. Tokyo, Japan, 2005, pp. 878-889
16 Apache Xerces C++ Parser. http://xml.apache.org/ xerces-c/
17 Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou, "Bloom Filter-based XML Packets Filtering for Millions of Path Queries," In Proceedings of the 21st IEEE International Conference on Data Engineering. Tokyo, Japan, 2005, pp. 890-901
18 Angel Luis Diaz and Douglas Lovell, XML Generator. http://www.alphaworks.ibm.com/ tech/xmlgenerator
19 Feng Peng and Sudarshan S. Chawathe, "XPath queries on streaming data," In Proceeding of the 2003 ACM-SIGMOD Conference, pp. 431-442, San Diego, CA, June 2003
20 Chee Yong Chan, Pascal Felber, Minos N. Garofalakis, Rajeev Rastogi, "Efficient Filtering of XML Documents with XPath Expressions," In Proceedings of the 18th IEEE International Conference on Data Engineering, pp. 235-244, San Jose, CA, February 2002
21 Bingsheng He, Qiong Luo, Byron Choi, "Cache-Conscious Automata for XML Filtering," IEEE Trans. Knowl. Data Eng, Vol.18, No.12, pp. 1629- 1644, 2006   DOI   ScienceOn
22 Mehmet Altinel, Michael J. Franklin, "Efficient Filtering of XML Documents for Selective Dissemination of Information," In Proceeding of the 26th VLDB Conference, pp. 53-64, Cairo, Egypt, September 2000
23 Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, "Processing XML streams with Deterministic Automata and Stream Indexes," ACM Trans. Database Syst., Vol.29, No.4, pp.752-788, 2004   DOI   ScienceOn
24 Michael Ley, DBLP Bibliography. http://www. informatik.uni-trier.de/~ley/db/
25 Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, "Processing XML Streams with Deterministic Automata," In Proceedings of the 9th International Conference on Database Theory. Siena, Italy, 2003, pp. 173-189