• Title/Summary/Keyword: Parallel Decoding

Search Result 152, Processing Time 0.024 seconds

Multi-mode Layered LDPC Decoder for IEEE 802.11n (IEEE 802.11n용 다중모드 layered LDPC 복호기)

  • Na, Young-Heon;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.11
    • /
    • pp.18-26
    • /
    • 2011
  • This paper describes a multi-mode LDPC decoder which supports three block lengths(648, 1296, 1944) and four code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n wireless LAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A novel memory reduction technique devised using the min-sum decoding algorithm reduces the size of check-node memory by 47% as compared to conventional method. From fixed-point modeling and Matlab simulations for various bit-widths, decoding performance and optimal hardware parameters such as fixed-point bit-width are analyzed. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.18-${\mu}m$ CMOS cell library. It has 219,100 gates and 45,036 bits RAM, and the estimated throughput is about 164~212 Mbps at 50 MHz@2.5v.

A Design of Multi-Standard LDPC Decoder for WiMAX/WLAN (WiMAX/WLAN용 다중표준 LDPC 복호기 설계)

  • Seo, Jin-Ho;Park, Hae-Won;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.363-371
    • /
    • 2013
  • This paper describes a multi-standard LDPC decoder which supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard and 3 block lengths(648, 1296, 1944) and 4 code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A DFU(decoding function unit) based on sign-magnitude arithmetic is used for hardware reduction. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.13-${\mu}m$ CMOS cell library. It has 312,000 gates and 70,000 bits RAM. The estimated throughput is about 79~210 Mbps at 100 MHz@1.8v.

Performance analysis and hardware design of LDPC Decoder for WiMAX using INMS algorithm (INMS 복호 알고리듬을 적용한 WiMAX용 LDPC 복호기의 성능분석 및 하드웨어 설계)

  • Seo, Jin-Ho;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.229-232
    • /
    • 2012
  • This paper describes performance evaluation using fixed-point Matlab modeling and simulation, and hardware design of LDPC decoder which is based on Improved Normalized Min-Sum(INMS) decoding algorithm. The designed LDPC decoder supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard. Considering hardware complexity, it is designed using a block-serial(partially parallel) architecture which is based on layered decoding scheme. A DFU based on sign-magnitude arithmetic is adopted to minimize hardware area. Hardware design is optimized by using INMS decoding algorithm whose performance is better than min-sum algorithm.

  • PDF

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

Implementation of Optical Paralle Adder using Polarization Coding (실시간 편광부호화에 의한 광병렬 가산기 구현)

  • 조웅호;배장근;노덕수;김수중
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.12
    • /
    • pp.1484-1493
    • /
    • 1992
  • In this paper, we propose the polarization coding of optical logic gates using filters and LCTV's, and represent the real-time system of an optical parallel adder to improve a carry propagation delay time. We fabricated a polarization filter for the polarization coding of a cell and an electrical system instead of an optical flip-flop which was necessary to an optical parallel adder. We used an optical fiber to play a part of decoding mask and interconnections in an optical parallel adder. The experimental results show that the polarization coding of a cell can represent 16 optical logic functions and that the implemented optical parallel adder can operate in real-time.

  • PDF

Radix-4 Trellis Parallel Architecture and Trace Back Viterbi Decoder with Backward State Transition Control (Radix-4 트렐리스 병렬구조 및 역방향 상태천이의 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.397-409
    • /
    • 2003
  • This paper describes an implementation of radix-4 trellis parallel architecture and backward state transition control trace back Viterbi decoder, and presents the application results to high speed wireless LAN. The radix-4 parallelized architecture Vietrbi decoder can not only improve the throughput with simple structure, but also have small processing delay time and overhead circuit compared to M-step trellis architecture one. Based on these features, this paper addresses a novel Viterbi decoder which is composed of branch metric computation, architecture of ACS and trace back decoding by sequential control of backward state transition for the implementation of radix-4 trellis parallelized structure. With the proposed architecture, the decoding of variable code rate due to puncturing the base code can easily be implemented by the unified Viterbi decoder. Moreover, any additional circuit and/or peripheral control logic are not required in the proposed decoder architecture. The trace back decoding scheme with backward state transition control can carry out the sequential decoding according to ACS cycle clock without additional circuit for survivor memory control. In order to evaluate the usefulness, the proposed method is applied to channel CODEC of the IEEE 802.11a high speed wireless LAN, and HDL coding simulation results are presented.

Implementation of filterbank for MPEG-2 AAC decoder with VHDL (VHDL을 이용한 MPEG-2 AAC 복호화기 필터뱅크의 구현)

  • 우광희;차형태
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.178-181
    • /
    • 2000
  • In this paper, we present the implementation of filterbank for MPEG-2 Advanced Audio Coding (AAC) decoder with VHDL. The filterbank of AAC employs a technique called time-domain aliasing cancellation (TDAC). In order to make the algorithm more efficiently, we decompose and reorganize the filterbank algorithm lot the high speed decoding process and lower computational cost. And we make this filterbank algorithm to be used with other modules of AAC decoder in parallel processing.

  • PDF

Parallel Processing Algorithm of JPEG2000 Using GPU (GPU를 이용한 JPEG2000 병렬 알고리즘)

  • Lee, Dong-Ha;Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1075-1080
    • /
    • 2008
  • Most modem computers or game consoles are well equipped with powerful graphics processing units(GPUs) to accelerate graphics operations. However, since the graphics engines in these GPUs are specially designed for graphics operations, we could not take advantage of their computing power for more general nongraphic operations. In this paper, we studied the GPUs graphics engine in order to accelerate the image processing capability. Specifically, we implemented a JPEC2000 decoding/encoding framework that involves both OpenMP and GPU. Initial experimental results show that significant speed-up can be achieved by utilizing the GPU power.

A high speed huffman decoder using new ternary CAM (새로운 Ternary CAM을 이용한 고속 허프만 디코더 설계)

  • 이광진;김상훈;이주석;박노경;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.7
    • /
    • pp.1716-1725
    • /
    • 1996
  • In this paper, the huffman decoder which is a part of the decoder in JPEG standard format is designed by using a new Ternary CAM. First, the 256 word * 16 bit-size new bit-word all parallel Ternary CAM system is designed and verified using SPICE and CADENCE Verilog-XL, and then the verified novel Ternary CAM is applied to the new huffman decoder architecture of JPEG. So the performnce of the designed CAM cell and it's block is verified. The new Ternary CAM has various applications because it has search data mask and storing data mask function, which enable bit-wise search and don't care state storing. When the CAM is used for huffman look-up table in huffman decoder, the CAM is partitioned according to the decoding symbol frequency. The scheme of partitioning CAM for huffman table overcomes the drawbacks of all-parallel CAM with much power and load. So operation speed and power consumption are improved.

  • PDF

Fully-Parallel Architecture for 1.4 Gbps Non-Binary LDPC Codes Decoder (1.4 Gbps 비이진 LDPC 코드 복호기를 위한 Fully-Parallel 아키텍처)

  • Choi, Injun;Kim, Ji-Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.48-58
    • /
    • 2016
  • This paper presents the high-throughput fully-parallel architecture for GF(64) (160,80) regular (2,4) non-binary LDPC (NB-LDPC) codes decoder based on the extended min sum algorithm. We exploit the NB-LDPC code that features a very low check node and variable node degree to reduce the complexity of decoder. This paper designs the fully-parallel architecture and allows the interleaving check node and variable node to increase the throughput of the decoder. We further improve the throughput by the proposed early sorting to reduce the latency of the check node operation. The proposed decoder has the latency of 37 cycles in the one decoding iteration and achieves a high throughput of 1402Mbps at 625MHz.