Search | Korea Science

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

Kwak, Jae Chang
- Journal of IKEEE
- /
- v.19 no.3
- /
- pp.455-459
- /
- 2015
This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.
https://doi.org/10.7471/ikeee.2015.19.3.455 인용 PDF KSCI

Development of Decoding Program Module for PLC Output Expansion (PLC출력 확장 디코딩 프로그램 모듈 개발)

You, Jeong-Bong
- Proceedings of the KIEE Conference
- /
- 2005.05a
- /
- pp.131-133
- /
- 2005
In this paper, we proposed the program module that expand output points when increased of the output machine for the design of process control system with PLC. In order to incense output points we need the decoding. There is a hardware decoding and a software decoding. In this paper, we proposed the decoding program module which is a software decoding and confirmed feasibility through a simulation.
PDF

Fast Thumbnail Extraction Algorithm with Partial Decoding for HEVC (HEVC에서 부분복호화를 통한 썸네일 추출 알고리듬)

Lee, Wonjin;Jeong, Jechang
- Journal of Broadcast Engineering
- /
- v.23 no.3
- /
- pp.431-436
- /
- 2018
In this paper, a simple but effective algorithm to reduce the computational complexity of thumbnail generation and to improve image quality without aliasing artifacts is proposed. For the high speed decoding, the proposed algorithm performs partial decoding per $4{\times}4$ boundary in TU(Transform Unit), and preforms TU boundary in PU(Prediction Unit). The proposed method defines the weights based on intra prediction directions and estimates the thumbnail pixel by using that weights. this method remains thumbnail extraction time and improves thumbnail image quality compared with conventional algorithms.
https://doi.org/10.5909/JBE.2018.23.3.431 인용 PDF KSCI KPUBS

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

Thi, Huyen Pham;Lee, Hanho
- IEIE Transactions on Smart Processing and Computing
- /
- v.6 no.3
- /
- pp.210-219
- /
- 2017
This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.
https://doi.org/10.5573/IEIESPC.2017.6.3.210 인용 PDF KSCI

Iterative Symbol Decoding of Variable-Length Codes with Convolutional Codes

Wu, Hung-Tsai;Wu, Chun-Feng;Chang, Wen-Whei
- Journal of Communications and Networks
- /
- v.18 no.1
- /
- pp.40-49
- /
- 2016
In this paper, we present a symbol-level iterative source-channel decoding (ISCD) algorithm for reliable transmission of variable-length codes (VLCs). Firstly, an improved source a posteriori probability (APP) decoding approach is proposed for packetized variable-length encoded Markov sources. Also proposed is a recursive implementation based on a three-dimensional joint trellis for symbol decoding of binary convolutional codes. APP channel decoding on this joint trellis is realized by modification of the Bahl-Cocke-Jelinek-Raviv algorithm and adaptation to the non-stationary VLC trellis. Simulation results indicate that the proposed ISCD scheme allows to exchange between its constituent decoders the symbol-level extrinsic information and achieves high robustness against channel noises.
https://doi.org/10.1109/JCN.2016.000007 인용 PDF KSCI

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
- Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
- /
- v.4 no.2
- /
- pp.369-378
- /
- 2014
The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.
https://doi.org/10.14257/AJMAHS.2014.12.10 인용

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.6
- /
- pp.2648-2668
- /
- 2016
Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.
https://doi.org/10.3837/tiis.2016.06.011 인용 PDF KSCI KPUBS HTML

A design of sign-magnitude based DFU block for LDPC decoder (LDPC 복호기를 위한 sign-magnitude 수체계 기반의 DFU 블록 설계)

Seo, Jin-Ho;Park, Hae-Won;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2011.10a
- /
- pp.415-418
- /
- 2011
This paper describes a circuit-level optimization of DFU(decoding function unit) for LDPC decoder which is used in wireless communication systems such as WiMAX and WLAN. The conventional DFU which is based on min-sum decoding algorithm needs conversions between two's complement values and sign-magnitude values, resulting in complex hardware. In this paper, a new design of DFU that is based on sign-magnitude arithmetic is proposed to achieve a simplified circuit and high-speed operation.
PDF

Design of an Area-Efficient Survivor Path Unit for Viterbi Decoder Supporting Punctured Codes (천공 부호를 지원하는 Viterbi 복호기의 면적 효율적인 생존자 경로 계산기 설계)

Kim, Sik;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.3A
- /
- pp.337-346
- /
- 2004
Punctured convolutional codes increase transmission efficiency without increasing hardware complexity. However, Viterbi decoder supporting punctured codes requires long decoding length and large survivor memory to achieve sifficiently low bit error rate (BER), when compared to the Viterbi decoder for a rate 1/2 convolutional code. This Paper presents novel architecture adopting a pipelined trace-forward unit reducing survivor memory requirements in the Viterbi decoder. The proposed survivor path architecture reduces the memory requirements by removing the initial decoding delay needed to perform trace-back operation and by accelerating the trace-forward process to identify the survivor path in the Viterbi decoder. Experimental results show that the area of survivor path unit has been reduced by 16％ compared to that of conventional hybrid survivor path unit.
PDF KSCI

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
- IEIE Transactions on Smart Processing and Computing
- /
- v.3 no.1
- /
- pp.1-9
- /
- 2014
This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.
https://doi.org/10.5573/IEIESPC.2014.3.1.1 인용 PDF KSCI

Search Result 85, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)