Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms

Lee, Kwan-Ho;Kim, Chi-Yong;

doi:10.7471/ikeee.2017.21.1.92

Journal of IKEEE (전기전자학회논문지)

Volume 21 Issue 1
/
Pages.92-95
/
2017
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms

병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법

Lee, Kwan-Ho (NEXT CHIP Inc.) ;
Kim, Chi-Yong (Dept. of Computer Science, Seokyeong University)

이관호 ;
김치용

Received : 2017.03.24
Accepted : 2017.03.29
Published : 2017.03.31

https://doi.org/10.7471/ikeee.2017.21.1.92 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.

본 논문에서는 적은 면적의 GP-GPU에서 성능을 향상시키기 위한 방법을 제안한다. 본 논문에서는 superscalar와 같이 과도하게 스케줄링 복잡성을 증가시키지 않는 대신 단순한 코어의 수를 늘려 성능을 극대화 시키는 방법을 제안한다. GP-GPU를 구성하는 Stream Processor의 구조를 단순화한다. 또한, Warp Schedule에서 thread 할당을 어플리케이션에 적합한 방법을 개발하여 성능을 개선한다. 성능을 검증하는 방안으로 neural network의 한 분야인 딥러닝에 대한 스레드 할당방식을 제안한다. Neural Network 알고리즘의 경우 Intel CPU 대비 90%에서 ARM Cortex-A15 4 core 대비 98% 성능 향상을 확인할 수 있었다.

Keywords

References

Shuai , Tao Li, Qiankun Dong, Xuechen Liu, Yule Yang, "CPU-assisted GPU thread pool model for dynamic task parallelism," Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on, 2015 DOI: 10.1109/NAS.2015.7255234
Seonghyeon Han, Sukwon Yoo, "The parallelization of binarization using a GP-GPU," The International Journal of Advanced Culture Technology, vol. 4, no. 4,, 2016
Tariq Rashid, "Make Your Own Neural Network," Hanbit media, 2017
Gyutaek Kyung, "A design of a SIMT architecture based GP-GPU using multi-banked cache memory structure," Master thesis, Seokyeong University, 2015.
Yun-Seop Hwang, Hee-Kyeong Jeon, Kwan-ho Lee, Kwang-yeob Lee, "Implementation of the SIMT based image signal processor for the image processing," j.inst.Korean.electr.electron.eng, vol 20, no.1, pp89-93, Apr, 2016
Odroid, "Odroid-XU," http://www.hardkernel.com
Raspberrypi, "raspberrypi," http://www.raspberrypi.org

Journal of IKEEE (전기전자학회논문지)

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms

병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)