• 제목/요약/키워드: Parallel Encoding

검색결과 101건 처리시간 0.022초

7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

  • Jung, Yong-Min;Chung, Chul-Ho;Jung, Yun-Ho;Kim, Jae-Seok
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제14권4호
    • /
    • pp.419-426
    • /
    • 2014
  • This paper proposes a high-throughput encoding process and encoder architecture for quasi-cyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.

GPU 컴퓨팅에 의한 고속 Double Random Phase Encoding (Fast Double Random Phase Encoding by Using Graphics Processing Unit)

  • 사이플라흐;문인규
    • 한국멀티미디어학회:학술대회논문집
    • /
    • 한국멀티미디어학회 2012년도 춘계학술발표대회논문집
    • /
    • pp.343-344
    • /
    • 2012
  • With the increase of sensitive data and their secure transmission and storage, the use of encryption techniques has become widespread. The performance of encoding majorly depends on the computational time, so a system with less computational time suits more appropriate as compared to its contrary part. Double Random Phase Encoding (DRPE) is an algorithm with many sub functions which consumes more time when executed serially; the computation time can be significantly reduced by implementing important functions in a parallel fashion on Graphics Processing Unit (GPU). Computing convolution using Fast Fourier transform in DRPE is the most important part of the algorithm and it is shown in the paper that by performing this portion in GPU reduced the execution time of the process by substantial amount and can be compared with MATALB for performance analysis. NVIDIA graphic card GeForce 310 is used with CUDA C as a programming language.

  • PDF

부분곱의 재정렬과 4:2 변환기법을 이용한 VLSI 고속 병렬 곱셈기의 새로운 구현 방법 (A new scheme for VLSI implementation of fast parallel multiplier using 2x2 submultipliers and ture 4:2 compressors with no carry propagation)

  • 이상구;전영숙
    • 전자공학회논문지C
    • /
    • 제34C권10호
    • /
    • pp.27-35
    • /
    • 1997
  • In this paper, we propose a new scheme for the generation of partial products for VLSI fast parallel multiplier. It adopts a new encoding method which halves the number of partial products using 2x2 submultipliers and rearrangement of primitive partial products. The true 4-input CSA can be achieved with appropriate rearrangement of primitive partial products out of 2x2 submultipliers using the newly proposed theorem on binary number system. A 16bit x 16bit multiplier has been desinged using the proposed method and simulated to prove that the method has comparable speed and area compared to booth's encoding method. Much smaller and faster multiplier could be obtained with far optimization. The proposed scheme can be easily extended to multipliers with inputs of higher resolutions.

  • PDF

JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증 (Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder)

  • 김용민;김종면
    • 대한임베디드공학회논문지
    • /
    • 제6권2호
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

부분병렬 알고리즘 기반의 LDPC 부호 구현 방안 (Design Methodology of LDPC Codes based on Partial Parallel Algorithm)

  • 정지원
    • 한국정보전자통신기술학회논문지
    • /
    • 제4권4호
    • /
    • pp.278-285
    • /
    • 2011
  • 본 논문에서는 DVB-S2 표준안에서 권고되고 있는 irregular LDPC 부호의 다양한 부호화율에서 부호화 방식 및 복호화 방식에 대해 살펴보고 이에 대한 성능분석을 하였다. 또한 이의 구현에 있어서 효율적인 메모리 할당 및 이에 따른 구현 방법에 대해 연구하였다. LDPC 복호기를 구현하는 방안에는 직렬, 부분병렬, 완전병렬 방식이 있으며, 부분병렬방식이 하드웨어 복잡도와 복호속도를 절충하는 방안이다. 따라서 본 논문에서는 부분병렬 구조를 기반으로 하는 LDPC 복호기의 메모리 설계에서 효율적인 체크노드, 비트노드, LLR 메모리의 구조를 제안하고저 한다.

Multi-Sever based Distributed Coding based on HEVC/H.265 for Studio Quality Video Editing

  • Kim, Jongho;Lim, Sung-Chang;Jeong, Se-Yoon;Kim, Hui-Yong
    • Journal of Multimedia Information System
    • /
    • 제5권3호
    • /
    • pp.201-208
    • /
    • 2018
  • High Efficiency Video Coding range extensions (HEVC RExt) is a kind of extension model of HEVC. HEVC RExt was specially designed for dealing the high quality images. HEVC RExt is very essential for studio editing which handle the very high quality and various type of images. There are some problems to dealing these massive data in studio editing. One of the most important procedure is re-encoding and decoding procedure during the editing. Various codecs are widely used for studio data editing. But most of the codecs have common problems to dealing the massive data in studio editing. First, the re-encoding and decoding processes are frequently occurred during the studio data editing and it brings enormous time-consuming and video quality loss. This paper, we suggest new video coding structure for the efficient studio video editing. The coding structure which is called "ultra-low delay (ULD)". It has the very simple and low-delayed referencing structure. To simplify the referencing structure, we can minimize the number of the frames which need decoding and re-encoding process. It also prevents the quality degradation caused by the frequent re-encoding. Various fast coding algorithms are also proposed for efficient editing such as tool-level optimization, multi-serve based distributed coding and SIMD (Single instruction, multiple data) based parallel processing. It can reduce the enormous computational complexity during the editing procedure. The proposed method shows 9500 times faster coding speed with negligible loss of quality. The proposed method also shows better coding gain compare to "intra only" structure. We can confirm that the proposed method can solve the existing problems of the studio video editing efficiently.

바디 제스처 인식을 위한 기초적 신체 모델 인코딩과 선택적 / 비동시적 입력을 갖는 병렬 상태 기계 (Primitive Body Model Encoding and Selective / Asynchronous Input-Parallel State Machine for Body Gesture Recognition)

  • 김주창;박정우;김우현;이원형;정명진
    • 로봇학회논문지
    • /
    • 제8권1호
    • /
    • pp.1-7
    • /
    • 2013
  • Body gesture Recognition has been one of the interested research field for Human-Robot Interaction(HRI). Most of the conventional body gesture recognition algorithms used Hidden Markov Model(HMM) for modeling gestures which have spatio-temporal variabilities. However, HMM-based algorithms have difficulties excluding meaningless gestures. Besides, it is necessary for conventional body gesture recognition algorithms to perform gesture segmentation first, then sends the extracted gesture to the HMM for gesture recognition. This separated system causes time delay between two continuing gestures to be recognized, and it makes the system inappropriate for continuous gesture recognition. To overcome these two limitations, this paper suggests primitive body model encoding, which performs spatio/temporal quantization of motions from human body model and encodes them into predefined primitive codes for each link of a body model, and Selective/Asynchronous Input-Parallel State machine(SAI-PSM) for multiple-simultaneous gesture recognition. The experimental results showed that the proposed gesture recognition system using primitive body model encoding and SAI-PSM can exclude meaningless gestures well from the continuous body model data, while performing multiple-simultaneous gesture recognition without losing recognition rates compared to the previous HMM-based work.

병렬계산의 스케쥴링에 있어서 유전자알고리즘에 관한 연구 (A study on the genetic algorithms for the scheduling of parallel computation)

  • 성기석;박지혁
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 1997년도 추계학술대회발표논문집; 홍익대학교, 서울; 1 Nov. 1997
    • /
    • pp.166-169
    • /
    • 1997
  • For parallel processing, the compiler partitions a loaded program into a set of tasks and makes a schedule for the tasks that will minimize parallel processing time for the loaded program. Building an optimal schedule for a given set of partitioned tasks of a program has known to be NP-complete. In this paper we introduce a GA(Genetic Algorithm)-based scheduling method in which a chromosome consists of two parts of a string which decide the number and order of tasks on each processor. An additional computation is used for feasibility constraint in the chromosome. By granularity theory, a partitioned program is categorized into coarse-grain or fine-grain types. There exist good heuristic algorithms for coarse-grain type partitioning. We suggested another GA adaptive to the coarse-grain type partitioning. The infeasibility of chromosome is overcome by the encoding and operators. The number of processors are decided while the GA find the minimum parallel processing time.

  • PDF

Optimization of a Systolic Array BCH encoder with Tree-Type Structure

  • Lim, Duk-Gyu;Shakya, Sharad;Lee, Je-Hoon
    • International Journal of Contents
    • /
    • 제9권1호
    • /
    • pp.33-37
    • /
    • 2013
  • BCH code is one of the most widely used error correcting code for the detection and correction of random errors in the modern digital communication systems. The conventional BCH encoder that is operated in bit-serial manner cannot adequate with the recent high speed appliances. Therefore, parallel encoding algorithms are always a necessity. In this paper, we introduced a new systolic array type BCH parallel encoder. To study the area and speed, several parallel factors of the systolic array encoder is compared. Furthermore, to prove the efficiency of the proposed algorithm using tree-type structure, the throughput and the area overhead was compared with its counterparts also. The proposed BCH encoder has a great flexibility in parallelization and the speed was increased by 40% than the original one. The results were implemented on synthesis and simulation on FPGA using VHDL.

Integer-Pel Motion Estimation for HEVC on Compute Unified Device Architecture (CUDA)

  • Lee, Dongkyu;Sim, Donggyu;Oh, Seoung-Jun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권6호
    • /
    • pp.397-403
    • /
    • 2014
  • A new video compression standard called High Efficiency Video Coding (HEVC) has recently been released onto the market. HEVC provides higher coding performance compared to previous standards, but at the cost of a significant increase in encoding complexity, particularly in motion estimation (ME). At the same time, the computing capabilities of Graphics Processing Units (GPUs) have become more powerful. This paper proposes a parallel integer-pel ME (IME) algorithm for HEVC on GPU using the Compute Unified Device Architecture (CUDA). In the proposed IME, concurrent parallel reduction (CPR) is introduced. CPR performs several parallel reduction (PR) operations concurrently to solve two problems in conventional PR; low thread utilization and high thread synchronization latency. The proposed encoder reduces the portion of IME in the encoder to almost zero with a 2.3% increase in bitrate. In terms of IME, the proposed IME is up to 172.6 times faster than the IME in the HEVC reference model.