• Title/Summary/Keyword: 병렬 압축 알고리즘

Search Result 54, Processing Time 0.024 seconds

Implementation of IQ/IDCT in H.264/AVC Decoder Using GPGPU (GPGPU를 이용한 H.264/AVC 디코더)

  • Kim, Dong-Han;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.162-164
    • /
    • 2010
  • H.264/AVC(Advanced Video Coding) is a standard for video compression. H.264/AVC provides good video quality at substantially lower bit rates than previous standards. In this papers, we propose the efficient architecture of H.264/AVC decoder using GPGPU. GPGPU can process many of operation in parallel. IQ/IDCT is possible that parallel processing in H.264/AVC decoding algorithm.

  • PDF

A Parallel Pipeline Execution Algorithm for H.264/AVC Intra Prediction (H.264/AVC의 인트라 예측 병렬 파이프라인 실행 알고리즘)

  • Xu, Jia-Yue;Cho, Hyo-Moon;Cho, Sang-Bock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.5
    • /
    • pp.79-86
    • /
    • 2008
  • H.264/AVC is the newest international video coding standard developed by the joint ITU-T and ISO/IEC standards organizations. This newest video coding standard offers much higher coding efficiency than the H.261, H.263 and MPEG-4. But it has high computing complexity and high H/W resources wasting problem. This paper described the two unit parallel pipeline structure. This new structure comparing with standard model decreased the computing complexity of 67% and the H/W resources waste of 3%.

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets (정수문자열의 δ-근사주기와 γ-근사주기를 찾는 병렬알고리즘)

  • Kim, Youngho;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.760-766
    • /
    • 2017
  • Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum ${\delta}-approximate$ periods and finding minimum ${\gamma}-approximate$ periods. Both problems can be solved in $O(n^2)$ time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n) time using $O(n^2)$ threads, respectively. The experimental results show that our parallel algorithms for finding minimum ${\delta}-approximate$ (resp. ${\gamma}-approximate$) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

A Co-design Method for JPEG2000 Video Compression System in Telemetry using DSP and FPGA (DSP와 FPGA의 Co-design을 이용한 원격측정용 임베디드 JPEG2000 시스템구현)

  • Yu, Jae-Taeg;Hyun, Myung-Han;Nam, Ju-Hun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.39 no.9
    • /
    • pp.896-903
    • /
    • 2011
  • In this paper, a co-design method for JPEG2000 video compression system using DSP and FPGA is presented. By profiling the complexity of JPEG2000 algorithm, it is noticed that a MQ-coder is the most complex part. Thus, we implement the MQ-coder on FPGA for the parallel processing using VHDL to reduce the complexity. In order to verify the performance of the MQ-coder, JBIG2 standard test vector and images are used. The experimental results show that the proposed MQ-coder enhances the processing time approximately 3 times compared with the previous software MQ-coder.

A Massively Parallel Algorithm for Fuzzy Vector Quantization (퍼지 벡터 양자화를 위한 대규모 병렬 알고리즘)

  • Huynh, Luong Van;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.411-418
    • /
    • 2009
  • Vector quantization algorithm based on fuzzy clustering has been widely used in the field of data compression since the use of fuzzy clustering analysis in the early stages of a vector quantization process can make this process less sensitive to its initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, this paper introduces an array architecture for the implementation of fuzzy vector quantization (FVQ). The arrayarchitecture, which consists of 4,096 processing elements (PEs), provides a computationally efficient solution by employing an effective vector assignment strategy during the clustering process. Experimental results indicatethat the proposed parallel implementation providessignificantly greater performance and efficiency than appropriately scaled alternative array systems. In addition, the proposed parallel implementation provides 1000x greater performance and 100x higher energy efficiency than other implementations using today's ARMand TI DSP processors in the same 130nm technology. These results demonstrate that the proposed parallel implementation shows the potential for improved performance and energy efficiency.

Zero-tree packetization without additional memory using DFS (DFS를 이용한 추가 메모리를 요구하지 않는 제로트리 압축기법)

  • Kim, Chung-Kil;Lee, Joo-Kyong;Chung, Ki-Dong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.5
    • /
    • pp.575-578
    • /
    • 2003
  • SPIHT algorithm is a wavelet based fast and effective technique for image compression. It uses a list structure to store status information which is generated during set-partitioning of zero-tree. Usually, this requires lots of additional memory depending on how high the bit-rate is. Therefore, in this paper, we propose a new technique called MZP-DFS, which needs no additional memory when running SPIHT algorithm. It traverses a spatial-tree according to DFS and eliminates additional memory as it uses test-functions for encoding and LSB bits of coefficients for decoding respectively. This method yields nearly the same performance as SPIHT. This may be desirable in hardware implementation because no additional memory is required. Moreover. it exploits parallelism to process each spatial-tree that it can be applied well in real-time image compression.

Analysis of Turbomachinery Internal Flow Using Parallel Computing (병렬컴퓨팅을 이용한 터보기계 내부 유동장 해석)

  • Yee, Jang-Jun;Kim, Yu-Shin;Lee, Dong-Ho
    • Proceedings of the KSME Conference
    • /
    • 2000.04b
    • /
    • pp.586-592
    • /
    • 2000
  • 터보머신 태부에 존재하는 정익 - 동익의 상호작용 유동현상을 수치모사 하는 코드를 병렬화 하였다 정익 - 동익의 상호작용을 해석하는 데에 편리하도륵 Multi-Block Grid System을 도입하여 계산영역을 형성하였고, 동익의 움직임으로 인해 발생하는 Sliding Interface부분은 Patched 알고리즘을 적용하여 해석하였다. 정익과 동익의 수를 1대 1로 단순화시켜 수치모사한 결과와 정익과 동익의 수를 실제 조건과 더 비슷하게 설정한 3대 4의 비율로 맞추어 수치모사한 결과를 비교하였다. 또한, 병렬컴퓨팅으로 인해 단축된 계산시간을 다른 연구에서의 계산시간들과 서로 비교하였다. 2차원 비정상 압축성 Navier-Stokes 방정식이 이용되었고, 난류모델링에는 K-w SST 모델링이 적응되었다. Roe의 FDS 기법을 사용하여 플럭스를 계산하였고, MUSCL 기법을 적용하여 3차의 공간정확도를 갖도록 하였다. 시간적분에는 이보성의 DP-SGS를 사용하였다. 해석결과의 분석에는 Time-averaged pressure distribution과 Pressure amplitude distribution 데이터를 사용했다.

  • PDF

Memory-Efficient High Performance Parallelization of Aho-Corasick Algorithm on Intel Xeon Phi (Intel Xeon Phi 에서의 Aho-Corasick 알고리즘을 위한 메모리 친화적인 고성능 병렬화)

  • Tran, Nhat-Phuong;Jeong, Yosang;Lee, Myungho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.87-89
    • /
    • 2014
  • Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in many applications with real-time performance requirements. In this paper, we parallelize the AC algorithm on the Intel's Many Integrated Core (MIC) Architecture, Xeon Phi Coprocessor. We propose a new technique to compress the Deterministic Finite Automaton structure which represents the set of pattern strings again which the input data is inspected for possible matches. The new technique reduces the cache misses and leads to significantly improved performance on Xeon Phi.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

The Design of DWT Processor for RealTime Image Compression (실시간 영상압축을 위한 DWT 프로세서 설계)

  • Gu, Dae Seong;Kim, Jong Bin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5C
    • /
    • pp.654-654
    • /
    • 2004
  • 본 논문에서는 이산웨이블렛 변환을 이용한 영상 압축 프로세서를 하드웨어로 구현하였다. 웨이블렛 변환을 위하여 필터뱅크 및 피라미드 알고리즘을 이용하였고 각 필터들은 FIR 필터로 구현하였다. 병렬구조로 이루어져 동일 클럭 싸이클에서 하이패스와 로패스를 동시에 수행함으로써 속도를 향상시킬 뿐 아니라 QMF 특성을 이용하여 DWT 연산에 필요한 승산기의 수를 절반으로 줄임으로써 하드웨어 크기를 줄이고 이용효율 또한 높일 수 있다. 다중 해상도 분해 시 필요한 메모리 컨트롤러를 하드웨어로 구현하여 DWT 계산이 수행되므로 이 융자는 단순한 파라메터 입력만으로 효과적인 압축율을 얻을 수 있도록 구조적으로 설계하였다. 실시간 영상압축 프로세서의 성능 예측을 위하여 MATLAB을 통하여 시뮬레이션 하였고, VHDL을 이용하여 각 모듈들을 설계하였다. 설계한 영상압축기는 Leonaro-Spectrum에서 합성하였고, ALTERA FLEX10KE(EPF10K100 EFC256) FPGA에 이식하여 하드웨어적으로 동작을 검증하였다. 설계된 부호화기는 512×512 Woman 영상에 대하여 33㏈의 PSNR값을 갖는다. 그리고 설계된 프로세서를 FPGA 구현 시 35㎒에서 정상적으로 동작한다.