Browse > Article
http://dx.doi.org/10.3745/KIPSTA.2002.9A.3.311

Hyper-Text Compression Method Based on LZW Dictionary Entry Management  

Sin, Gwang-Cheol (Dept.of Computer Engineering, Graduate School of Chungang University)
Han, Sang-Yong (Dept.of Computer Engineering, Chungang University)
Abstract
LZW is a popular variant of LZ78 to compress text documents. LZW yields a high compression rate and is widely used by many commercial programs. Its core idea is to assign most probably used character group an entry in a dictionary. If a group of character which is already positioned in a dictionary appears in the streaming data, then an index of a dictionary is replaced in the position of character group. In this paper, we propose a new efficient method to find least used entries in a dictionary using counter. We also achieve higher compression rate by preassigning widely used tags in hyper-text documents. Experimental results show that the proposed method is more effective than V.42bis and Unix compression method. It gives 3∼8% better in the standard Calgary Corpus and 23∼24% better in HTML documents.
Keywords
LZW; Data Compression; Dictionary-based Compression; Hyper-text; Expanded Initial Dictionary;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Thomborson and Clark, 'The V.42bis Standard for Data-Compressing Modems,' IEEE Micro, pp.41-53, October, 1992   DOI   ScienceOn
2 Welch T, 'A Technique for High Performance Data Compression,' IEEE Computer, Vol.17, No.6, pp.8-19, 1984   DOI   ScienceOn
3 Phillips and Dwayne, 'LZW Data Compression,' The Computer Application Journal Circuit Cellar Inc., 27, pp.36-48, June/July
4 Horspool, N. R, 'Improving LZW,' in Proceedings of the 1991 Data Compression Conference, J. Storer Ed., Los Alamitos, CA, IEEE Computer Society Press, pp.332-341, 1991   DOI
5 Available at ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corups
6 Saloman D., 'Data Compression -the complete reference,' Springer, 1997
7 Baeza-Yates R, Riebeiro-Neto B, 'Modern Information Retrieval,' Addison Wesley, 1999
8 Ziv, J. and Lempel A., 'A Universal Algorithm for Sequential Data Compression,' IEEE Transaction on Information Theory IT-23(3) : pp.337-343, 1977   DOI
9 Ziv, J. and Lempel A., 'Compression of Individual Sequences via Variable-Rate Coding,' IEEE Transaction on Information Theory IT-24(5) : pp.530-536, 1978