통합 검색 | Korea Science

SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계 (Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance)

곽재창
- 전기전자학회논문지
- /
- 제19권3호
- /
- pp.455-459
- /
- 2015
본 논문은 그래픽 처리 뿐 만 아니라 범용 연산의 가속화를 지원하기 위한 SIMT 구조 GP-GPU의 Dispatch Unit과 Operand Selection Unit을 제안한다. Warp Scheduler로부터 발행된 명령어에서 사용되는 Operand의 모든 정보를 Decoding 하면 불필요한 Operand Load가 발생하여 레지스터 부하가 발생 한다. 이러한 문제점을 해결하기 위해 Pre-decoding방법을 사용하여 Operand의 정보만을 먼저 Decoding 하여 Operand Load를 줄이고, 레지스터의 부하를 줄일 수 있는 방법을 제안한다. 제안하는 Dispatch Unit에서 나온 Operand 정보들을 레지스터 뱅크 충돌을 방지하는 방법을 적용한 Operand Selection Unit에 전달해 전체적인 처리 성능을 향상 시켰다. Modelsim 10.0b를 이용하여 Warp Scheduler로부터 발행된 10,000개의 임의의 명령어를 처리하여 소요되는 총 Clock Cycle을 측정하였다. 본 논문에서 제안한 Pre-Decoding 기능을 탑재한 Dispatch Unit과 Operand Selection Unit을 적용하여 기존의 방법들 보다 각각 약 11%, 24%의 처리 효율이 증가한 것을 확인 할 수 있었다.
https://doi.org/10.7471/ikeee.2015.19.3.455 인용 PDF KSCI

PLC출력 확장 디코딩 프로그램 모듈 개발 (Development of Decoding Program Module for PLC Output Expansion)

유정봉
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 2005년도 심포지엄 논문집 정보 및 제어부문
- /
- pp.131-133
- /
- 2005
In this paper, we proposed the program module that expand output points when increased of the output machine for the design of process control system with PLC. In order to incense output points we need the decoding. There is a hardware decoding and a software decoding. In this paper, we proposed the decoding program module which is a software decoding and confirmed feasibility through a simulation.
PDF

HEVC에서 부분복호화를 통한 썸네일 추출 알고리듬 (Fast Thumbnail Extraction Algorithm with Partial Decoding for HEVC)

이원진;정제창
- 방송공학회논문지
- /
- 제23권3호
- /
- pp.431-436
- /
- 2018
본 논문에서는 aliasing artifact 없이 영상 품질을 유지하고, 썸네일 생성에 필요한 계산 복잡도를 줄이는 알고리듬을 제안한다. 제안하는 알고리듬은 고속으로 복호화를 진행하기 위해서 TU(Transform Unit)에서는 $4{\times}4$크기마다 경계부분만을 부분 복호화를 수행하고, PU(Prediction Unit)에서는 TU경계부분만을 부분 복호화 한다. 그리고 화면내 예측 모드 방향에 따른 가중치 값을 구하고, 그 값을 이용해서 실제 썸네일 화소를 예측한다. 제안하는 방법은 기존 방법들과 썸네일 추출시간을 비슷하게 유지하면서 썸네일의 품질을 향상시킨다.
https://doi.org/10.5909/JBE.2018.23.3.431 인용 PDF KSCI KPUBS

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

Thi, Huyen Pham;Lee, Hanho
- IEIE Transactions on Smart Processing and Computing
- /
- 제6권3호
- /
- pp.210-219
- /
- 2017
This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.
https://doi.org/10.5573/IEIESPC.2017.6.3.210 인용 PDF KSCI

Iterative Symbol Decoding of Variable-Length Codes with Convolutional Codes

Wu, Hung-Tsai;Wu, Chun-Feng;Chang, Wen-Whei
- Journal of Communications and Networks
- /
- 제18권1호
- /
- pp.40-49
- /
- 2016
In this paper, we present a symbol-level iterative source-channel decoding (ISCD) algorithm for reliable transmission of variable-length codes (VLCs). Firstly, an improved source a posteriori probability (APP) decoding approach is proposed for packetized variable-length encoded Markov sources. Also proposed is a recursive implementation based on a three-dimensional joint trellis for symbol decoding of binary convolutional codes. APP channel decoding on this joint trellis is realized by modification of the Bahl-Cocke-Jelinek-Raviv algorithm and adaptation to the non-stationary VLC trellis. Simulation results indicate that the proposed ISCD scheme allows to exchange between its constituent decoders the symbol-level extrinsic information and achieves high robustness against channel noises.
https://doi.org/10.1109/JCN.2016.000007 인용 PDF KSCI

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
- 예술인문사회 융합 멀티미디어 논문지
- /
- 제4권2호
- /
- pp.369-378
- /
- 2014
The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.
https://doi.org/10.14257/AJMAHS.2014.12.10 인용

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제10권6호
- /
- pp.2648-2668
- /
- 2016
Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.
https://doi.org/10.3837/tiis.2016.06.011 인용 PDF KSCI KPUBS HTML

LDPC 복호기를 위한 sign-magnitude 수체계 기반의 DFU 블록 설계 (A design of sign-magnitude based DFU block for LDPC decoder)

서진호;박해원;신경욱
- 한국정보통신학회:학술대회논문집
- /
- 한국해양정보통신학회 2011년도 추계학술대회
- /
- pp.415-418
- /
- 2011
WiMAX, WLAN 등의 무선통신 시스템에 사용되는 LDPC(low-density parity check) 복호기의 핵심 기능블록인 DFU(decoding function unit)의 회로 최적화를 제안한다. 최소합(min-sum) 복호 알고리듬 기반의 DFU는 2의 보수 값과 sign-magnitude 값 사이의 변환이 필요하여 회로가 복잡해진다. 본 논문에서는 sign-magnitude 연산 기반의 DFU를 설계하여 수체계 변환과정을 제거함으로써 회로를 간소화시키고 동작속도를 향상시켰다.
PDF

천공 부호를 지원하는 Viterbi 복호기의 면적 효율적인 생존자 경로 계산기 설계 (Design of an Area-Efficient Survivor Path Unit for Viterbi Decoder Supporting Punctured Codes)

김식;황선영
- 한국통신학회논문지
- /
- 제29권3A호
- /
- pp.337-346
- /
- 2004
천공 부호를 지원하는 비터비 복호기는 하드웨어 복잡도를 유지하는 선에서 부호율을 효율적으로 높일 수 있지만 충분한 BER 성능을 얻기 위해 복호 지연 시간이 길어지고 생존자 메모리의 크기가 늘어나는 단점이 있다. 본 논문은 비터비 복호기의 메모리 소요량을 줄이는 파이프라인화 된 순방향 추적기를 포함하는 생존자 경로 계산기를 제안한다. 제안된 생존자 경로 계산기는 역추적에 필요한 초기 복호 지연을 없애고, 경로 계산을 위한 순방향 추적 과정을 가속함으로써 생존자 메모리의 사용량을 감소시킨다. 실험 결과, 제안된 비터비 복호기의 생존자 계산기는 기존의 혼성 생존자 경로 계산기에 비해 약 16％ 면적이 감소함을 확인하였다.
PDF KSCI

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
- IEIE Transactions on Smart Processing and Computing
- /
- 제3권1호
- /
- pp.1-9
- /
- 2014
This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.
https://doi.org/10.5573/IEIESPC.2014.3.1.1 인용 PDF KSCI

검색결과 85건 처리시간 0.024초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)