통합 검색 | Korea Science

Algorithmic GPGPU Memory Optimization

Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
- JSTS:Journal of Semiconductor Technology and Science
- /
- 제14권4호
- /
- pp.391-406
- /
- 2014
The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.
https://doi.org/10.5573/JSTS.2014.14.4.391 인용 PDF KSCI

GPGPU의 멀티 쓰레드를 활용한 고성능 병렬 LU 분해 프로그램의 구현 (Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs)

신봉희;김영태
- 인터넷정보학회논문지
- /
- 제12권3호
- /
- pp.131-137
- /
- 2011
GPGPU는 원래 그래픽 계산을 위한 프로세서인 GPU를 일반 계산에 활용하여 저전력으로 고성능의 효율을 보이는 신개념의 계산 장치이다. 본 논문에서는 GPGPU에서 계산을 하기 위한 병렬 LU 분해법의 알고리즘을 제안하였다. Nvidia GPGPU에서 프로그램을 실행하기 위한 CUDA 계산 환경에서는 계산하고자 하는 데이터 도메인을 블록으로 나누고 각 블록을 쓰레드들이 동시에 계산을 하는데, 이 때 블록들의 계산 순서는 무작위로 진행이 되기 때문에 블록간의 데이터 의존성을 가지는 LU 분해 프로그램에서는 결과가 정확하지 않게 된다. 본 논문에서는 병렬 LU 분해법에서 블록간의 계산 순서를 인위적으로 정하는 구현 방식을 제안하며 아울러 LU 분해법의 부분 피벗팅을 계산하기 위한 병렬 reduction 알고리즘도 제안한다. 또한 구현된 병렬프로그램의 성능 분석을 통하여 GPGPU의 멀티 쓰레드 기반으로 고성능으로 계산할 수 있는 병렬프로그램의 효율성을 보인다.
PDF KSCI

검색결과 12건 처리시간 0.02초

Algorithmic GPGPU Memory Optimization

GPGPU의 멀티 쓰레드를 활용한 고성능 병렬 LU 분해 프로그램의 구현 (Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)