• Title/Summary/Keyword: XML compression

Search Result 18, Processing Time 0.025 seconds

A Queriable XML Compression using Inferred Data Types (추론한 데이타 타입을 이용한 질의 가능 XML 압축)

  • ;;Chung Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.441-451
    • /
    • 2005
  • HTML is mostly stored in native file systems instead of specialized repositories such as a database. Like HTML, XML, the standard for the exchange and the representation of data in the Internet, is mostly resident on native file systems. However. since XML data is irregular and verbose, the disk space and the network bandwidth are wasted compared to those of regularly structured data. To overcome this inefficiency of XML data, the research on the compression of XML data has been conducted. Among recently proposed XML compression techniques, some techniques do not support querying compressed data, while other techniques which support querying compressed data blindly encode data values using predefined encoding methods without considering the types of data values which necessitates partial decompression for processing range queries. As a result, the query performance on compressed XML data is degraded. Thus, this research proposes an XML compression technique which supports direct and efficient evaluations of queries on compressed XML data. This XML compression technique adopts an encoding method, called dictionary encoding, to encode each tag of XML data and applies proper encoding methods for encoding data values according to the inferred types of data values. Also, through the implementation and the performance evaluation of the XML compression technique proposed in this research, it is shown that the implemented XML compressor efficiently compresses real-life XML data lets and achieves significant improvements on query performance for compressed XML data.

Design and Implementation of a XML Compression Algorithm Supporting Query Processing for Compressed Documents (압축된 문서에 대한 질의 처리를 지원하는 XML 압축 알고리즘의 설계 및 구현)

  • 이석재;강영준;유재수;조기형
    • The Journal of the Korea Contents Association
    • /
    • v.4 no.1
    • /
    • pp.90-99
    • /
    • 2004
  • With the spread of internet, the digitalization and the knowledge informatization are in progress rapidly. Specially, numerous users make the various works and use the services on the web. For the most part, these works make use of the XML The XML shines the reusing of the documents because it is separated from contents and sues. Also, it can re-define the logic structure of the document for requirement of the developer. However, the XML document’s size is much larger than common text document because it handles the document type and adds numerous tags for representing structure of the document. To utilize the limited storage devices of Palmtop, PDA and so u, it is necessary to compress and handle the documents efficiently. Recently, the compression techniques for efficiently handling and compressing the XML documents are under way to solve this problem. But most of the existing researches don't support the query processing for the compressed XML documents. In this paper, we design and implement the XML compression algorithm that compresses the XML document and Processes the query of compressed XML document faster and more efficiently than previous techniques.

  • PDF

The XML Compression Algorithm Supporting Query Processing For Compressed Documents (압축된 문서에 대해 질의 처리를 지원하는 XML 압축 알고리즘)

  • 강영준;이석재;유재수
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2003.11a
    • /
    • pp.195-203
    • /
    • 2003
  • With the spread of interment, the digitalization and knowledge-based information are in progress. Specially, numerous users make the various works and use the services on the web. For the most part, these works make use of the XML. The XML shines the reusing of the Documents because it is separated from contents and styles. Also, it can re-define the logic structure of the Document for requirement of the developer. However, the XML document's size is much larger than common text document because it basically handles the document type and adds numerous tags for representing structure of the document. To utilize the limited storage of Palmtop, PDA and so on, it is necessary to compress and handle the documents efficiently. Recently, the compression techniques for efficiently handling and compressing the XML documents are in progress to solve this problem. But the existing research doesn't support the query processing for that. In this paper, we design and implement the XML compression algorithm that compresses the XML document and processes the quay of compressed XML document faster and mote effciently than the previous techniques.

  • PDF

A Queriable XML Compression Through An Extraction of Type Information (타입 정보 추출을 통한 질의 가능 XML 압축)

  • 박명제;민준기;정진완
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.554-556
    • /
    • 2003
  • 인터넷에서 널리 사용되는 HTML은 현재 데이터베이스 시스템과 같은 저장소 대신, 전형적인 파일 시스템에 저장되는 경우가 대부분이다. 마찬가지로 최근에 인터넷 상에서의 데이터 교환 및 표현의 표준으로 부각되는 XML 역시 파일 시스템에 저장되는 경우가 많다. 하지만, XML 문서의 비정규적인 구조와 장황성 때문에. 디스크 공간이나 네트워크 대역폭이 정규적인 구조의 데이터에 비해 비효율적이다. 따라서. 이를 해결하고자. XML 문서의 압축에 관한 연구가 진행되었다. 하지만. 최근에 연구된 XML 압축 기법들은 압축한 XML 문서에 대한 질의를 지원하지 않거나, 질의를 지원하더라도 XML 문서의 데이터 값들의 특성을 고려하지 않고 단순히 기존의 압축 방법을 통해 XML 문서를 압축한다. 그러므로 본 연구에서는 압축한 XML 문서에 대한 질의를 효율적으로 지원하는 XML 압축 기법을 제안한다. 본 연구에서는 태그를 Dictionary 압축으로 압축하며 태그 별로 데이터 값들의 타입을 추출하여 추출한 타입에 적절한 압축 방법으로 데이터 값을 압축한다. 또한, 제안하는 압축 기법의 구현 및 성능 평가를 통하여. 구현한 시스템이 실생활에 사용되는 XML 문서들을 효율적으로 압축하며 향상된 질의 성능을 제공하는 것을 보인다.

  • PDF

Implementation of Encoder and Decoder for MPEG-7 BiM (MPEG-7 BiM 부호화기 및 복호화기의 구현)

  • Yeom, Ji-Hyeon;Kim, Min-Je;Lee, Han-Kyu;Kim, Hyeok-Man
    • Journal of Broadcast Engineering
    • /
    • v.12 no.2
    • /
    • pp.159-176
    • /
    • 2007
  • In the paper, we implemented a software system that encodes XML instance documents conforming to a schema document according to the MPEG-7 BiM compression method, and decodes the encoded documents vice versa. We designed software structures of BiM encoder and decoder as class hierarchies, and then implemented the structures. The implemented BiM encoder shows a compression ratio of 9.44% on the average. The BiM encoder is a general-purpose XML compressor that can encode any instance documents conforming to a schema document described in XML Schema language including the MPEG-7 schema. The BiM encoder thus can be used in many application fields including digital broadcasting environment, where encoding XML instance documents is needed.

Compression/Decompression of XML Instance Documents Conforming to a Schema (스키마를 이용한 XML 문서의 압축과 복원)

  • Yum, Ji-Hyun;Kim, Hyeok-Man
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10d
    • /
    • pp.157-160
    • /
    • 2006
  • 본 논문은 MPEG-7 BiM 규격에 따라 XML 스키마 정의를 기반으로 바이너리 형태로 압축하고 복원하는 시스템의 구현에 관한 것이다. MPEG-7 BiM 압축기 및 복원기의 세부 모듈과 기능을 서술하고, 설계 및 구현방법을 제안한다. 구현된 MPEG-7 BiM 압축기 및 복원기는 대역폭의 제약이 심한 방송 분야에서 메타데이터 전송을 위한 핵심 모듈로 사용 될 수 있다.

  • PDF

Keyword Analysis Based Document Compression System

  • Cao, Kerang;Lee, Jongwon;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.1
    • /
    • pp.48-51
    • /
    • 2018
  • The traditional documents analysis was centered on words based system was implemented using a morpheme analyzer. These traditional systems can classify used words in the document but, cannot help to user's document understanding or analysis. In this problem solved, System needs extract for most valuable paragraphs what can help to user understanding documents. In this paper, we propose system extracts paragraphs of normalized XML document. User insert to system what filename when wants for analyze XML document. Then, system is search for keyword of the document. And system shows results searched keyword. When user choice and inserts keyword for user wants then, extracting for paragraph including keyword. After extracting paragraph, system operating maintenance paragraph sequence and check duplication. If exist duplication then, system deletes paragraph of duplication. And system informs result to user what counting each keyword frequency and weight to user, sorted paragraphs.

A Study on Multiple Sensorial Media Application Format (다중 감각 미디어 응용 포맷의 구성 방법 연구)

  • Jung, Yup Oh;Kim, Sang-Kyun
    • Journal of Broadcast Engineering
    • /
    • v.21 no.3
    • /
    • pp.330-340
    • /
    • 2016
  • This paper explains about the structure of multiple sensorial media application format (ISO/IEC 23000-17), which is newly standardized as a project of MPEG-A. This format facilitates effective storage, playing, and management of media with multiple sensorial effects. The ISO base media file format from MPEG-4 Part 12 and sensory effect metadata (SEM) from MPEG-V Part 3 are used to composed the multiple sensorial media application format. In this paper, a fragmentation method to break a SEM XML document into valid SEM samples is presented. Several binarization methods to compress the SEM samples are compared and evaluated as well. The compression ratio and processing time using the MPEG-V binary representation and the Binary MPEG format for XML (BiM) are superior to the gzip compression.

A Space Compression of Three-Dimensional Bitmap Indexing using Linked List (연결 리스트를 이용한 3차원 비트맵 인덱싱의 공간 축약)

  • Lee, Jae-Min;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05c
    • /
    • pp.1519-1522
    • /
    • 2003
  • 기존의 웹 문서나 컨텐츠들의 표현적 한계를 극복하기 위한 방안으로 메타 데이터에 관한 다양한 연구가 수행되어졌고 그 결과의 산물중에 가장 대표적인 것으로 XML을 들 수 있다. XML은 문서의 내용뿐 아니라 구조까지도 기술할 수 있는 장점을 통해 향후 정보 교환에 핵심적인 역할을 할 것으로 기대되어지고 있으며 이에 따라 XML 문서를 효율적으로 저장하고 검색하기 위한 다양한 연구가 진행되고 있다. BitCube는 Bit-wise 연산이 가능한 3차원 비트맵 인덱싱을 사용하여 XML 문서들의 구조적 유사성에 따라 클러스터링하고 사용자의 질의에 대한 처리를 수행하는 인덱싱 기법으로 그것의 빠른 성능을 입증하였다. 그러나 BitCube의 클러스터링은 XML 문서의 경로에 중점을 둔 것이므로 클러스터와 경로가 담고 있는 실제 단어들간에는 연관성이 없으므로 3차원 비트맵 인덱스는 하나의 평면을 제외한 모든 평면이 굉장히 높은 공간 사용량을 갖는 회소행렬이 된다. 본 논문에서는 늘어나는 방대한 문서의 양으로 인한 시스템의 성능 저하를 막고 안정적인 성능을 유지할 수 있도록 기존 연산의 성능을 저하시키지 않으면서 공간을 최소화 할 수 있는 연결 리스트틀 설계하고 3차원 비트맵 인덱스를 연결 리스트로 재구성하는 방법을 제시한다.

  • PDF

An Efficient Scheme of Encapsulation Method to Avoid Fragmentation Degradation During TVA Metadata Delivery (TVA 메타데이터 전송과정에서 단편화에 의한 성능 감소를 회피하기 위한 효율적인 캡슐화 방식)

  • Oh, Bong-Jin;Park, Jong-Youl;Kim, Sang-Hyung;Yoo, Kwan-Jong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.7C
    • /
    • pp.627-636
    • /
    • 2012
  • Recently, XML is used to describe details of service and contents for various fields such as IPTV and digital broadcast services because of it's high readability and extensibility. TV-Anytime's schema and delivery protocol have been especially adopted as basic standards for them, and extended to include their own private functions. However, XML describes documents using text-based method, and this causes to create big documents rather than traditional methods. Therefore, many encoding algorithms have been proposed to reduce XML documents like EXI, BiM, GZIP and fast-info set etc. Although these algorithms shows efficient compression effects for XML documents, but they can't avoid fragmentation degradation during encapsulation steep. This paper proposes an efficient encapsulation scheme of TV-Anytime to avoid fragmentation degradation of encoding effect using common string tables.