Search | Korea Science

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
- KIISE Transactions on Computing Practices
- /
- v.20 no.9
- /
- pp.507-512
- /
- 2014
Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.
https://doi.org/10.5626/KTCP.2014.20.9.507 인용

Software-based Real-time GNSS Signal Generation and Processing Using a Graphic Processing Unit (GPU)

Im, Sung-Hyuck;Jee, Gyu-In
- Journal of Positioning, Navigation, and Timing
- /
- v.3 no.3
- /
- pp.99-105
- /
- 2014
A graphic processing unit (GPU) can perform the same calculation on multiple data (SIMD: single instruction multiple data) using hundreds of to thousands of special purpose processors for graphic processing. Thus, high efficiency is expected when GPU is used for the generation and correlation of satellite navigation signals, which perform generation and processing by applying the same calculation procedure to tens of millions of discrete signal samples per second. In this study, the structure of a GPU-based GNSS simulator for the generation and processing of satellite navigation signals was designed, developed, and verified. To verify the developed satellite navigation signal generator, generated signals were applied to the OEM-V3 receiver of Novatel Inc., and the measured values were examined. To verify the satellite navigation signal processor, the performance was examined by collecting and processing actual GNSS intermediate frequency signals. The results of the verification indicated that satellite navigation signals could be generated and processed in real time using two GPUs.
https://doi.org/10.11003/JPNT.2014.3.3.099 인용 PDF KSCI KPUBS HTML

Accelerating Depth Image-Based Rendering Using GPU (GPU를 이용한 깊이 영상기반 렌더링의 가속)

Lee, Man-Hee;Park, In-Kyu
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.11
- /
- pp.853-858
- /
- 2006
In this paper, we propose a practical method for hardware-accelerated rendering of the depth image-based representation(DIBR) of 3D graphic object using graphic processing unit(GPU). The proposed method overcomes the drawbacks of the conventional rendering, i.e. it is slow since it is hardly assisted by graphics hardware and surface lighting is static. Utilizing the new features of modem GPU and programmable shader support, we develop an efficient hardware-accelerating rendering algorithm of depth image-based 3D object. Surface rendering in response of varying illumination is performed inside the vertex shader while adaptive point splatting is performed inside the fragment shader. Experimental results show that the rendering speed increases considerably compared with the software-based rendering and the conventional OpenGL-based rendering method.
PDF KSCI

3D Holographic Image Recognition by Using Graphic Processing Unit

Lee, Jeong-A;Moon, In-Kyu;Liu, Hailing;Yi, Faliu
- Journal of the Optical Society of Korea
- /
- v.15 no.3
- /
- pp.264-271
- /
- 2011
In this paper we examine and compare the computational speeds of three-dimensional (3D) object recognition by use of digital holography based on central unit processing (CPU) and graphic processing unit (GPU) computing. The holographic fringe pattern of a 3D object is obtained using an in-line interferometry setup. The Fourier matched filters are applied to the complex image reconstructed from the holographic fringe pattern using a GPU chip for real-time 3D object recognition. It is shown that the computational speed of the 3D object recognition using GPU computing is significantly faster than that of the CPU computing. To the best of our knowledge, this is the first report on comparisons of the calculation time of the 3D object recognition based on the digital holography with CPU vs GPU computing.
https://doi.org/10.3807/JOSK.2011.15.3.264 인용 PDF KSCI

Accelerating the Sweep3D for a Graphic Processor Unit

Gong, Chunye;Liu, Jie;Chen, Haitao;Xie, Jing;Gong, Zhenghu
- Journal of Information Processing Systems
- /
- v.7 no.1
- /
- pp.63-74
- /
- 2011
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.
https://doi.org/10.3745/JIPS.2011.7.1.063 인용 PDF KSCI

Design of a SIMT architecture GP-GPU Using Tile based on Graphic Pipeline Structure (타일 기반 그래픽 파이프라인 구조를 사용한 SIMT 구조 GP-GPU 설계)

Kim, Do-Hyun;Kim, Chi-Yong
- Journal of IKEEE
- /
- v.20 no.1
- /
- pp.75-81
- /
- 2016
This paper proposes a design of the tile based on graphic pipeline to improve the graphic application performance in SIMT based GP-GPU. The proposed Tile based on graphics pipeline avoids unnecessary graphic processing operation, and processes the rasterization step in parallel. The massive data processing in parallel through SIMT architecture improve the computational performance, thereby improving the 3D graphic pipeline performance. The more vertex data of 3D model, the higher performance. The proposed structure was confirmed to improve processing performance of up to 3 times from about 1.18 times as compared to 'RAMP' and previous studies.
https://doi.org/10.7471/ikeee.2016.20.1.075 인용 PDF KSCI

Fast Generation of Digital Hologram Based on Multi-GPU (Multi-GPU 기반의 고속 디지털 홀로그램 생성)

Song, Joong-Seok;Park, Jung-Sik;Seo, Young-Ho;Park, Jong-Il
- Journal of Broadcast Engineering
- /
- v.16 no.6
- /
- pp.1009-1017
- /
- 2011
Fast generation of digital hologram is of importance for real-time holography broadcasting. In this paper, we propose such a method that parallelizes the Computer-Generated Holography (CGH) algorithm for digital hologram generation and make it faster using Multi Graphic Processing Unit (Multi-GPU) with help of the Compute Unified Device Architecture (CUDA) and the Open Multi-Processing (OpenMP). In addition, we propose optimization methods such as fixation variable, vectorization, and loop unrolling for making the CGH algorithm much faster. Experimental results show that our method is about 9,700 times faster than a CPU-based one.
https://doi.org/10.5909/JEB.2011.16.6.1009 인용 PDF KSCI

A Realization of CNN-based FPGA Chip for AI (Artificial Intelligence) Applications (합성곱 신경망 기반의 인공지능 FPGA 칩 구현)

Young Yun
- Proceedings of the Korean Institute of Navigation and Port Research Conference
- /
- 2022.11a
- /
- pp.388-389
- /
- 2022
Recently, AI (Artificial Intelligence) has been applied to various technologies such as automatic driving, robot and smart communication. Currently, AI system is developed by software-based method using tensor flow, and GPU (Graphic Processing Unit) is employed for processing unit. However, if software-based method employing GPU is used for AI applications, there is a problem that we can not change the internal circuit of processing unit. In this method, if high-level jobs are required for AI system, we need high-performance GPU, therefore, we have to change GPU or graphic card to perform the jobs. In this work, we developed a CNN-based FPGA (Field Programmable Gate Array) chip to solve this problem.
PDF

A study on application of GPU-accelerated kinematic wave rainfall-runoff model (GPU 가속 운동파 강우유출모형의 적용 연구)

Kim, Boram;Yun, Gwan Seon;Kim, Hyeong-Jun;Yoon, Kwang Seok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2020.06a
- /
- pp.323-323
- /
- 2020
그래픽 처리 장치(Graphic Processing Unit: GPU)는 그래픽 처리 작업에 특화된 다수의 산술논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 본 연구는 GPU 가속 운동파모형을 실제 유역에 적용하여, GPU 가속 운동파 강우유출모형 결과에 대한 정확성과 연산 소요 시간에 대한 효율성을 확인하였다. GPU 가속 운동파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시키기 위해 CUDA 포트란을 이용하여 개발되었다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되었고, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 운동파모형을 이용하여 금강의 미호천 유역에서 발생하는 강우유출현상을 모의 하였고, 동일한 유한체적법을 이용한 CPU(Central Processing Unit) 기반의 강우유출모형과 비교하였다. 그 결과 GPU 가속모형의 결과는 미호천 유역 하류단에서 관측한 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반의 강우유출모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.
PDF

Accelerating Numerical Analysis of Reynolds Equation Using Graphic Processing Units (그래픽처리장치를 이용한 레이놀즈 방정식의 수치 해석 가속화)

Myung, Hun-Joo;Kang, Ji-Hoon;Oh, Kwang-Jin
- Tribology and Lubricants
- /
- v.28 no.4
- /
- pp.160-166
- /
- 2012
This paper presents a Reynolds equation solver for hydrostatic gas bearings, implemented to run on graphics processing units (GPUs). The original analysis code for the central processing unit (CPU) was modified for the GPU by using the compute unified device architecture (CUDA). The red-black Gauss-Seidel (RBGS) algorithm was employed instead of the original Gauss-Seidel algorithm for the iterative pressure solver, because the latter has data dependency between neighboring nodes. The implemented GPU program was tested on the nVidia GTX580 system and compared to the original CPU program on the AMD Llano system. In the iterative pressure calculation, the implemented GPU program showed 20-100 times faster performance than the original CPU codes. Comparison of the wall-clock times including all of pre/post processing codes showed that the GPU codes still delivered 4-12 times faster performance than the CPU code for our target problem.
https://doi.org/10.9725/kstle.2012.28.4.160 인용 PDF KSCI

Search Result 81, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)