통합 검색 | Korea Science

내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성 (The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU)

성한울;엄현상;염헌영
- 정보과학회 컴퓨팅의 실제 논문지
- /
- 제20권9호
- /
- pp.507-512
- /
- 2014
최근 전력의 한계 때문에 많은 트랜지스터를 모두 이용할 수 없는 '다크실리콘' 문제가 발생했다. 이 문제를 효율적으로 해결하기 위하여 CPU(Central processing unit)와 GPU(Graphic processing unit)를 함께 사용하여 분산처리하기 시작했다. 최근에는 CPU(Central processing unit)와 GPU(Graphic processing unit)가 메모리와 Last Level Cache를 공유하는 내장형 GPU 프로세서(Integrated graphic processing unit processor)가 등장했다. 하지만 CPU 프로세스와 GPU 프로세스가 LLC(Last level cache)로 접근하기 위한 어떠한 규칙이 없기 때문에, 동시에 CPU 프로세스와 GPU 프로세스 수행될 때 LLC(Last level cache)를 차지하기 위한 경쟁이 일어나 성능 저하가 발생한다. 본 논문에서는 캐시 접근 빈도가 큰 여러 개의 프로세스들이 수행됨에 따라 캐시 오염이 발생한 상황에서 GPU 프로세스의 성능 보장을 위하여 GPU 프로세스만을 위한 고정된 Last Level Cache 공간을 주는 캐시 분할방식이 필요함을 증명하고 캐시를 분할하기 위한 페이지 컬러링 기법을 소개하고 디자인한다.
https://doi.org/10.5626/KTCP.2014.20.9.507 인용

GPU를 활용한 공간 가상 시뮬레이션 표현에 관한 연구 (A Study On The Virtual Space Simulation Expression Using Graphic Process Unit)

김종현;김석태
- 한국실내디자인학회:학술대회논문집
- /
- 한국실내디자인학회 2004년도 추계학술발표대회 논문집
- /
- pp.80-83
- /
- 2004
It is impossible to do real verification on design spaces before their completions due to the characteristics of building and interior space designs. So, in designing spaces, designers should reflect their real experiences in their lives into their design works. 3D games where GPU and other kinds of advanced technologies have bee applied first show their leads in technologies have bee applied first show their leads in technologies about 5 years than VRML. Those games which are produced reflecting real environment as it is could be regarded as the most excellent tool in their completeness level of physical environment due to their characteristics. This means that if 3D game engines employing GPU are used effectively they could be used as a presentation tool for virtual spaces. This study studies the expressions of virtual constructions through 3D game engines employing GPU, not in VRML-based virtual spaces on Webs but in immersion-type virtual spaces.
PDF

Accelerating the Sweep3D for a Graphic Processor Unit

Gong, Chunye;Liu, Jie;Chen, Haitao;Xie, Jing;Gong, Zhenghu
- Journal of Information Processing Systems
- /
- 제7권1호
- /
- pp.63-74
- /
- 2011
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.
https://doi.org/10.3745/JIPS.2011.7.1.063 인용 PDF KSCI

실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계 (The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System)

조용일;하중림;양지현;김재협
- 한국컴퓨터정보학회논문지
- /
- 제18권1호
- /
- pp.1-10
- /
- 2013
본 논문은3차원레이저레이더(LADAR, Laser Ladar) 영상 생성 시스템 개발을 수행함에 있어, 요구되는 실시간 처리를 구현하기 위해 CPU(Central Processing Unit) 및 GPU(Graphic Processing Unit)의 병렬처리 구조를 설계하는 CUDA(Common Unified Device Architecture) 기반 소프트웨어(SW, Software) 구현 기법에 대하여 설명한다. LADAR 시스템은 레이저 거리정보를 기반으로 3차원 영상을 생성하는 복잡도 높은 시스템으로써, 각 단계별로 많은 량의 처리 자원이 필요하다. 따라서, 한정된 시스템 자원 내에서 이를 실시간으로 처리하기 위해서는 반드시 병렬처리 구조를 설계 및 적용해야 한다. 본 논문에서는, 처리 알고리즘의 단계적 분석을 통해 분할 가능한 작업에 대하여 CUDA GPU로 할당 및 처리를 수행함으로써, 시스템에서 요구하는 실시간 처리를 달성하였으며, 처리 속도 분석을 통해 최대 46%의 처리 속도 향상을 확인할 수 있었다.
https://doi.org/10.9708/jksci.2013.18.1.001 인용 PDF KSCI

GPU 아키텍처의 AES 암호화 성능 예측 분석 모델 (An Analytical Model for Performance Prediction of AES on GPU Architecture)

김규운;김현우;김희정;허태영;정상혁;송용호
- 전자공학회논문지
- /
- 제50권4호
- /
- pp.89-96
- /
- 2013
컴퓨터의 그래픽 연산장치인 GPU는 그래픽 데이터의 연산뿐만 아니라 일반시스템 데이터를 처리할 수 있도록 발전되었으며, 3D 그래픽 관련 알고리즘이나 병렬 실행이 가능한 코드에 대해서는 CPU 보다 우수한 성능을 보여주고 있다. CPU 기반으로 제작된 일반적인 알고리즘을 GPU에서 실행하기 위해서는, GPU 시스템의 아키텍처를 이해하고 병렬처리 능력과 새로운 메모리 구조를 고려하여 코드를 재작성하여야 한다. 이를 위해서는 알고리즘을 성능 예측 모델에 적용하여 GPU 시스템에서 예상되는 성능 예측이 필수적이다. 이를 통해 GPU 기반 어플리케이션 개발에서 발생할 수 있는 문제점들을 사전에 예측하고, 성능에 대한 평가 지표를 구성할 수 있다. 본 논문에서는 AES 암호화 알고리즘에 성능예측 모델을 적용하여 작업량이 많은 조건하에서 높은 정확도로 성능 예측을 수행하였다.
https://doi.org/10.5573/ieek.2013.50.4.089 인용 PDF KSCI

GPU 컴퓨팅에 의한 고속 Double Random Phase Encoding (Fast Double Random Phase Encoding by Using Graphics Processing Unit)

사이플라흐;문인규
- 한국멀티미디어학회:학술대회논문집
- /
- 한국멀티미디어학회 2012년도 춘계학술발표대회논문집
- /
- pp.343-344
- /
- 2012
With the increase of sensitive data and their secure transmission and storage, the use of encryption techniques has become widespread. The performance of encoding majorly depends on the computational time, so a system with less computational time suits more appropriate as compared to its contrary part. Double Random Phase Encoding (DRPE) is an algorithm with many sub functions which consumes more time when executed serially; the computation time can be significantly reduced by implementing important functions in a parallel fashion on Graphics Processing Unit (GPU). Computing convolution using Fast Fourier transform in DRPE is the most important part of the algorithm and it is shown in the paper that by performing this portion in GPU reduced the execution time of the process by substantial amount and can be compared with MATALB for performance analysis. NVIDIA graphic card GeForce 310 is used with CUDA C as a programming language.
PDF

Morphology Operations on CUDA To Remove Skull on MRI Images

요니 셉티안;최흥국
- 한국멀티미디어학회:학술대회논문집
- /
- 한국멀티미디어학회 2012년도 춘계학술발표대회논문집
- /
- pp.205-208
- /
- 2012
Nowadays GPU (Graphic Process Unit) is not only used to show and render some images, but also for another computation. In this paper, we tried to use GPU to do some morphology operations to remove skull from axial MRI images. This skull removing process is an important step in brain segmentation because we would like to work with the brain only, without any skull on it. The result shows that simple morphology operations to remove skull has been successfully applied on MRI images, but there are still many parts that can be develop to get better images.
PDF

GPU-Accelerated Password Cracking of PDF Files

Kim, Keon-Woo;Lee, Sang-Su;Hong, Do-Won;Ryou, Jae-Cheol
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제5권11호
- /
- pp.2235-2253
- /
- 2011
Digital document file such as Adobe Acrobat or MS-Office is encrypted by its own ciphering algorithm with a user password. When this password is not known to a user or a forensic inspector, it is necessary to recover the password to open the encrypted file. Password cracking by brute-force search is a perfect approach to discover the password but a time consuming process. This paper presents a new method of speeding up password recovery on Graphic Processing Unit (GPU) using a Compute Unified Device Architecture (CUDA). PDF files are chosen as a password cracking target, and the Abode Acrobat password recovery algorithm is examined. Experimental results show that the proposed method gives high performance at low cost, with a cluster of GPU nodes significantly speeding up the password recovery by exploiting a number of computing nodes. Password cracking performance is increased linearly in proportion to the number of computing nodes and GPUs.
https://doi.org/10.3837/tiis.2011.11.021 인용 PDF KSCI

CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현 (Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP)

박안진;장홍훈;정기철
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제36권4호
- /
- pp.253-260
- /
- 2009
컴퓨터 비전이나 패턴 인식 분야에서 이용되고 있는 많은 알고리즘들이 최근 빠른 수행시간을 위해 GPU에서 구현되고 있지만, GPU를 이용하여 알고리즘을 구현할 경우 크게 두 가지 문제점을 고려해야 한다. 첫째, 컴퓨터 그래픽스 분야의 지식이 필요한 쉐이딩(shading) 언어를 알아야 한다. 둘째, GPU를 효율적으로 활용하기 위해 CPU와 GPU간의 데이터 교환을 최소화해야 한다. 이를 위해 CPU는 GPU에서 처리할 수 있는 최대 용량의 데이터를 생성하여 GPU에 전송해야 하기 때문에 CPU에서 많은 처리시간을 소모하며, 이로 인해 CPU와 GPU 사이에 많은 오버헤드가 발생한다. 본 논문에서는 그래픽 하드웨어와 멀티코어(multi-core) CPU를 이용한 빠르고 효율적인 신경망 구현 방법을 제안한다. 기존 GPU의 첫 번째 문제점을 해결하기 위해 제안된 방법은 복잡한 쉐이팅 언어 대신 그래픽스적인 기본지식 없이도 GPU를 이용하여 응용프로그램 개발이 가능한 CUDA를 이용하였다. 두 번째 문제점을 해결하기 위해 멀티코어 CPU에서 공유 메모리 환경의 병렬화를 수행할 수 있는 OpenMP를 이용하였으며, 이의 처리시간을 줄여 CPU와 GPU 환경에서 오버 헤드를 최소화할 수 있다. 실험에서 제안된 CUDA와 OpenMP기반의 구현 방법을 신경망을 이용한 문자영역 검출 알고리즘에 적용하였으며, CPU에서의 수행시간과 비교하여 약 15배, GPU만을 이용한 수행시간과 비교하여 약 4배정도 빠른 수행시간을 보였다.
PDF KSCI

GPU를 이용한 실시간 양안식 영상 생성 방법 (Real-time Stereo Video Generation using Graphics Processing Unit)

신인용;호요성
- 방송공학회논문지
- /
- 제16권4호
- /
- pp.596-601
- /
- 2011
양안식 3차원 방송의 경우 좌우 두 시점에 해당하는 영상을 동시에 전송해야 하기 때문에 전송 대역폭의 부담이 매우 크다. 이러한 부담을 줄이기 위해 좌우 시점의 두 영상을 전송하는 대신에 좌영상과 이에 해당하는 깊이맵을 부호화하여 전송하는 방법이 있다. 이러한 3차원 방송 시스템의 수신단에서는 좌영상과 깊이맵을 복호한 뒤에 우영상을 만들어 좌우 영상을 실시간으로 출력한다. 본 논문에서는 좌영상과 깊이맵을 이용하여 가상시점 영상을 생성할 때 생기는 빈 공간을 효율적으로 채우는 기법을 제안하고, 전 과정의 실시간 처리를 위해 이를 GPU상에서 병렬로 처리되도록 구현했다. 그 결과 효과적으로 홀 채움을 수행하면서 CPU 대비 15배 이상 빠르게 양안식 영상을 생성할 수 있었다.
https://doi.org/10.5909/JEB.2011.16.4.596 인용 PDF KSCI

검색결과 16건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)