Search | Korea Science

Lee, Hyung-Bong;Baek, Chung-Ho
- The KIPS Transactions:PartA
- /
- v.9A no.2
- /
- pp.249-258
- /
- 2002
One of the jai or purposes of multiprocessor system is to get a high efficiency in performance improvement. But in most cases, it is unavoidable to use some special programming languages or tools for full use of multiprocessor system. In general, loop and recursive call statements of algorithms are considered as typical parts for parallelization. Especially, recursive call statements are easy to parallelize conceptually without support of any special languages or tools. But it is difficult to control the degree of parallelism caused by high depth of recursive call leading to execution crash. This paper proposes a device to control Parallelism in the process of POSIX thread bated parallelization of recursive algorithms. For this, we define the concept of thread and process in UNIX system, and analyze the results of experimental application of the device to quick sorting algorithm.
https://doi.org/10.3745/KIPSTA.2002.9A.2.249 인용 PDF KSCI

Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.7 no.3
- /
- pp.55-61
- /
- 2011
This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.
https://doi.org/10.17662/ksdim.2011.7.3.055 인용 PDF KSCI

Ryu, Eun-Kyung;Jo, Hyun-Ho;Seo, Jung-Han;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
- Journal of Broadcast Engineering
- /
- v.17 no.3
- /
- pp.503-518
- /
- 2012
In this paper, we propose a complexity-based parallelization method of the sample adaptive offset (SAO) algorithm which is one of HEVC in-loop filters. The SAO algorithm can be regarded as region-based process and the regions are obtained and represented with a quad-tree scheme. A offset to minimize a reconstruction error is sent for each partitioned region. The SAO of the HEVC can be parallelized in data-level. However, because the sizes and complexities of the SAO regions are not regular, workload imbalance occurs with multi-core platform. In this paper, we propose a LCU-based SAO algorithm and a complexity prediction algorithm for each LCU. With the proposed complexity-based LCU processing, we found that the proposed algorithm is faster than the sequential implementation by a factor of 2.38 times. In addition, the proposed algorithm is faster than regular parallel implementation SAO by 21%.
https://doi.org/10.5909/JBE.2012.17.3.503 인용 PDF KSCI

Jo, Hyun-Ho;Ryu, Eun-Kyung;Nam, Jung-Hak;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
- Journal of Broadcast Engineering
- /
- v.17 no.5
- /
- pp.742-755
- /
- 2012
In this paper, we propose a parallel deblocking algorithm to resolve workload imbalance when the deblocking filter of high efficiency video coding (HEVC) decoder is parallelized. In HEVC, the deblocking filter which is one of the in-loop filters conducts two-step filtering on vertical edges first and horizontal edges later. The deblocking filtering can be conducted with high-speed through data-level parallelism because there is no dependency between adjacent edges for deblocking filtering processes. However, workloads would be imbalanced among regions even though the same amount of data for each region is allocated, which causes performance loss of decoder parallelization. In this paper, we solve the problem for workload imbalance by predicting the complexity of deblocking filtering with coding unit (CU) depth information at a coding tree block (CTB) and by allocating the same amount of workload to each core. Experimental results show that the proposed method achieves average time saving (ATS) by 64.3%, compared to single core-based deblocking filtering and also achieves ATS by 6.7% on average and 13.5% on maximum, compared to the conventional uniform data-level parallelism.
https://doi.org/10.5909/JBE.2012.17.5.742 인용 PDF KSCI