• Title/Summary/Keyword: Partially parallel architecture

Search Result 12, Processing Time 0.031 seconds

Area-Efficient Semi-Parallel Encoding Structure for Long Polar Codes (긴 극 부호를 위한 저 면적 부분 병렬 극 부호 부호기 설계)

  • Shin, Yerin;Choi, Soyeon;Yoo, Hoyoung
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1288-1294
    • /
    • 2019
  • The channel-achieving property made the polar code show to advantage as an error-correcting code. However, sufficient error-correction performance shows the asymptotic property that is achieved when the length of the code is long. Therefore, efficient architecture is needed to realize the implementation of very-large-scale integration for the case of long input data. Although the most basic fully parallel encoder is intuitive and easy to implement, it is not suitable for long polar codes because of the high hardware complexity. Complementing this, a partially parallel encoder was proposed which has an excellent result in terms of hardware area. Nevertheless, this method has not been completely generalized and has the disadvantage that different architectures appear depending on the hardware designer. In this paper, we propose a hardware design scheme that applies the proposed systematic approach which is optimized for bit-dimension permutations. By applying this solution, it is possible to design a generalized partially parallel encoder for long polar codes with the same intuitive architecture as a fully parallel encoder.

7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

  • Jung, Yong-Min;Chung, Chul-Ho;Jung, Yun-Ho;Kim, Jae-Seok
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.419-426
    • /
    • 2014
  • This paper proposes a high-throughput encoding process and encoder architecture for quasi-cyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

  • Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.576-586
    • /
    • 2008
  • In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진18진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.173-176
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs the radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of the radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진/8진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06a
    • /
    • pp.125-128
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs tile radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of tile radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

A LDPC Decoder for DVB-S2 Standard Supporting Multiple Code Rates (DVB-S2 기반에서 다양한 부호화 율을 지원하는 LCPC 복호기)

  • Ryu, Hye-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.118-124
    • /
    • 2008
  • For forward error correction, DVB-S2, which is the digital video broadcasting forward error coding and modulation standard for satellite television, uses a system based the concatenation of BCH with LDPC inner coding. In DVB-S2 the LDPC codes are defined for 11 different code rates, which means that a DVB-S2 LDPC decoder should support multiple code rates. Seven of the 11 code rates, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, and 9/10, are regular and the rest four code rates, 1/4, 1/3, 2/5, and 1/2, are irregular. In this paper we propose a flexible decoder for the regular LDPC codes. We combined the partially parallel decoding architecture that has the advantages in the chip size, the memory efficiency, and the processing rate with Benes network to implement a DVB-S2 LDPC decoder that can support multiple code rates with a block size of 64,800 and can configure the interconnection between the variable nodes and the check nodes according to the parity-check matrix. The proposed decoder runs correctly at the frequency of 200MHz enabling 193.2Mbps decoding throughput. The area of the proposed decoder is $16.261m^2$ and the power dissipation is 198mW at a power supply voltage of 1.5V.

Design of Low Complexity and High Throughput Encoder for Structured LDPC Codes (구조적 LDPC 부호의 저복잡도 및 고속 부호화기 설계)

  • Jung, Yong-Min;Jung, Yun-Ho;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.10
    • /
    • pp.61-69
    • /
    • 2009
  • This paper presents the design results of a low complexity and high throughput LDPC encoder structure. In order to solve the high complexity problem of the LDPC encoder, a simplified matrix-vector multiplier is proposed instead of the conventional complex matrix-vector multiplier. The proposed encoder also adopts a partially parallel structure and performs column-wise operations in matrix-vector multiplication to achieve high throughput. Implementation results show that the proposed architecture reduces the number of logic gates and memory elements by 37.4% and 56.7%, compared with existing five-stage pipelined architecture. The proposed encoder also supports 800Mbps throughput at 40MHz clock frequency which is improved about three times more than the existing architecture.

Code Rate 1/2, 2304-b LDPC Decoder for IEEE 802.16e WiMAX (IEEE 802.16e WiMAX용 부호율 1/2, 2304-비트 LDPC 복호기)

  • Kim, Hae-Ju;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.4A
    • /
    • pp.414-422
    • /
    • 2011
  • This paper describes a design of low-density parity-check(LDPC) decoder supporting block length 2,304-bit and code rate 1/2 of IEEE 802.16e mobile WiMAX standard. The designed LDPC decoder employs the min-sum algorithm and partially parallel layered-decoding architecture which processes a sub-matrix of $96{\times}96$ in parallel. By exploiting the properties of the min-sum algorithm, a new memory reduction technique is proposed, which reduces check node memory by 46% compared to conventional method. Functional verification results show that it has average bit-error-rate(BER) of $4.34{\times}10^{-5}$ for AWGN channel with Fb/No=2.1dB. Our LDPC decoder synthesized with a $0.18{\mu}m$ CMOS cell library has 174,181 gates and 52,992 bits memory, and the estimated throughput is about 417 Mbps at 100-MHz@l.8-V.

Multi-mode Layered LDPC Decoder for IEEE 802.11n (IEEE 802.11n용 다중모드 layered LDPC 복호기)

  • Na, Young-Heon;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.11
    • /
    • pp.18-26
    • /
    • 2011
  • This paper describes a multi-mode LDPC decoder which supports three block lengths(648, 1296, 1944) and four code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n wireless LAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A novel memory reduction technique devised using the min-sum decoding algorithm reduces the size of check-node memory by 47% as compared to conventional method. From fixed-point modeling and Matlab simulations for various bit-widths, decoding performance and optimal hardware parameters such as fixed-point bit-width are analyzed. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.18-${\mu}m$ CMOS cell library. It has 219,100 gates and 45,036 bits RAM, and the estimated throughput is about 164~212 Mbps at 50 MHz@2.5v.

A Design of Multi-Standard LDPC Decoder for WiMAX/WLAN (WiMAX/WLAN용 다중표준 LDPC 복호기 설계)

  • Seo, Jin-Ho;Park, Hae-Won;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.363-371
    • /
    • 2013
  • This paper describes a multi-standard LDPC decoder which supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard and 3 block lengths(648, 1296, 1944) and 4 code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A DFU(decoding function unit) based on sign-magnitude arithmetic is used for hardware reduction. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.13-${\mu}m$ CMOS cell library. It has 312,000 gates and 70,000 bits RAM. The estimated throughput is about 79~210 Mbps at 100 MHz@1.8v.