Browse > Article

A Queriable XML Compression using Inferred Data Types  

Chung Chin-Wan (한국과학기술원 전자전산학과)
Abstract
HTML is mostly stored in native file systems instead of specialized repositories such as a database. Like HTML, XML, the standard for the exchange and the representation of data in the Internet, is mostly resident on native file systems. However. since XML data is irregular and verbose, the disk space and the network bandwidth are wasted compared to those of regularly structured data. To overcome this inefficiency of XML data, the research on the compression of XML data has been conducted. Among recently proposed XML compression techniques, some techniques do not support querying compressed data, while other techniques which support querying compressed data blindly encode data values using predefined encoding methods without considering the types of data values which necessitates partial decompression for processing range queries. As a result, the query performance on compressed XML data is degraded. Thus, this research proposes an XML compression technique which supports direct and efficient evaluations of queries on compressed XML data. This XML compression technique adopts an encoding method, called dictionary encoding, to encode each tag of XML data and applies proper encoding methods for encoding data values according to the inferred types of data values. Also, through the implementation and the performance evaluation of the XML compression technique proposed in this research, it is shown that the implemented XML compressor efficiently compresses real-life XML data lets and achieves significant improvements on query performance for compressed XML data.
Keywords
XML; XML compression; Semi-Adaptive Compression; Homomorphism;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Salomon, 'Data Compression, the complete reference,' Springer-Verlag, New York, 1998
2 D. A. Huffman, 'A Method for the Construction of Minimum Redundancy Codes,' Proc. of the Institute of Radio Engineers 40, pp. 1098-1101, 1952
3 Anonymous, http://www.cs.washington.edu/research/projects/xmJtk/www/xmlproperties.html
4 E. R. Harold, Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/examples/baseball/
5 R. Cover, The XML Cover Pages, http://www.oasis-open.org/cover/xml.html, 2001
6 P. M. Tolani and J. R. Haritsa, 'XGRIND: A Query-friendly XML Compressor,' Proc. of 18th International Conference on Database Engineering, pp. 225-234, 2002   DOI
7 J. Gailly and M. Adler, zlib 1.1.4, http://www.gzip.org/zlib/, 2002
8 P. G. Howard and J. S. Vitter, 'Analysis of Arithmetic Coding for Data Compression,' Proc. of the IEEE Data Compression Conference, pp. 3-12, 1991
9 J. Clark and S. DeRose, 'XML Path Language(XPath) Version 1.0, W3C Recommendation,' http://www.w3.org/TH/xpath, 1999
10 J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald, 'Efficiently Publishing Relational Data as XML Documents,' Proc. of 26th International Conference on Very Large Data Bases, pp. 65-76, 2000
11 S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu; J. Robie, and J. Simeon, 'XQuery 1.0: An XML Query Language, Working Draft,' http://www.w3.org/TR/2002/WD-xquery-20020816, 2002
12 D. Raggett, A. L. Hors, and I. Jacobs, 'HTML 4.01 Specification, W3C Recommendation,' http://www.w3.org/TR/html4/, 1999
13 H. Liefke and D. Suciu, 'XMill: An Efficient Compressor for XML Data,' Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 153-164, 2000
14 D. Florescu, and D. Kossman, 'Storing and Querying XML Data using an RDBMS,' IEEE Data Engineering Bulletin, Vol. 22, No. 3, pp. 27-34, 1999
15 R. Goldman and J. Widom, 'DataGuides: Enable Query Formulation and Optimization in Semistructured DataBases,' Proc. of 23rd International Conference on Very Large Data Bases, pp. 436-445, 1997
16 M. F. Fernandez, W. C. Tan, and D. Suciu, 'SilkRoute: trading between relations and XML,' WWW9/Computer Networks, Vol. 33, No. 1-6, pp. 723-745, 2000   DOI   ScienceOn
17 Igor Tatarinov, et. al, 'Storing and querying ordered XML using a relational database system', Proc. of ACM SIGMOD, pp. 204-215, 2002   DOI
18 M. Fernandez and D. Suciu, Optimizing Regular path expressions Using Graph Schemas, Proc. of Int. Conf. on Data Engineering, 1998   DOI
19 T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler, 'Extensible Markup Language(XML) 1.0 W3C Recommendation,' http://www.w3c.org/TR/REC-XML, 1998
20 T. Shimura, M. Yoshikawa, and S. Uemura, 'Storing and Retrieval of XML Documents using Object-Relational Databases.' Proc. of 10th International Conference, DEXA, pp. 206-217, 1999