• Title/Summary/Keyword: 범용그래픽프로세서

Search Result 21, Processing Time 0.024 seconds

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

A Fully Programmable Shader Processor for Low Power Mobile Devices (저전력 모바일 장치를 위한 완전 프로그램 가능형 쉐이더 프로세서)

  • Jeong, Hyung-Ki;Lee, Joo-Sock;Park, Tae-Ryong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.253-259
    • /
    • 2009
  • In this paper, we propose a novel architecture of a general graphics shader processor without a dedicated hardware. Recently, mobile devices require the high performance graphics processor as well as the small size, low power. The proposed shader processor is a GP-GPU(General-Purpose computing on Graphics Processing Units) to execute the whole OpenGL ES 2.0 graphics pipeline by using shader instructions. It does not require the separate dedicate H/W such as rasterization on this fully programmable capability. The fully programmable 3D graphics shader processor can reduce much of the graphics hardware. The chip size of the designed shader processor is reduced 60% less than the sizes of previous processors.

  • PDF

슈프컴퓨터 아키텍쳐 -기술현황및 발전추세-

  • 김성천
    • 전기의세계
    • /
    • v.38 no.7
    • /
    • pp.11-18
    • /
    • 1989
  • 최근 수년간의 슈퍼컴퓨터의 성향은 확실히 변화하고 있다. 괄목할 만한 것은 초대형의 엄청난 고가의 슈퍼컴퓨터에서나 가능하였던 고해상도의 실시간 화상처리를 이제는 Desk-top 형태의 그래픽 슈퍼컴퓨터에서도 가능해졌다는 점이다. 소위 "visualization"라 불리우는 그래픽 처리를 일반화 하고 있는 것이다. 두말 할 것 없이 초고속의 저렴한 그래픽전용 프로세서의 개발과 벡터프로세싱의 구조를 적용한 초강도의 병렬성의 덕택이라 해도 과언이 아닐듯 싶다. 이렇듯 어느 한정된 응용에서의 최적화된 병렬구조가 가져다준 기술혁신은 인류문명의 찬란한 한페이지를 막 열려하고 있다 하겠다. 물론 아직도 풀리려하지 않는 근본적인 문제가 있기는 하지만 주어진 특수 분야에 국한되지 않고 어느 범용분야에도 병렬처리를 하는 궁극적인 병렬성을 수행하는 슈퍼슈퍼 컴퓨터의 제작이 가능할 것인가 하고 의문점이 생긴다. 의문점이 생긴다.

  • PDF

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

제2세대 웍스테이션 "RISC"시스템 6000

  • 김은현
    • Computational Structural Engineering
    • /
    • v.3 no.3
    • /
    • pp.62-65
    • /
    • 1990
  • RISC System/6000은 유닉스 시스템인 AIX를 오퍼레이팅 시스템으로 채택하였고, 기존의 RISC기술에 혁신적인 진보를 이룩하여 가격 대 성능비를 크게 높임과 동시에 시스템의 기능을 극도로 최적화 시킨 새로운 차원의 아이비엠의 고성능 시스템패밀리이다. 이 시스템은 새로운 RISC 시스템 구조인 POWER(Performance Optimization With Enhanced RISC) 개념과 제2세대 수퍼스칼라 기법 및 마이크로 채널 아키텍쳐로 설계되어 있다. 특히 하나의 사이클에서 4개 이상의 명령어를 병렬처리 하도록 설계된 수퍼스칼라 기능을 통하여 복잡한 그래픽 또는 이미지 처리 및 고도의 수치해석 기능이 뛰어나다. RISC시스템/6000은 과학기술계산업무나 멀티사용자의 일반 비즈니스용으로도 모두 뛰어난 범용 컴퓨터로 그래픽 프로세서의 선택과 함께 CAD/CAM이나 그래픽/애니메이션전용 시스템을 구성할 수 있으며, 최고 512 사용자에 이르는 멀티 사용자 시스템을 구성하여 사용할 수 있다. 이전의 유닉스 시스템에 있어서 큰 약점이었던 사용자 인터페이스와 멀티 사용자 및 테스킹이 크게 강화 되었으며, 기존의 IBM 시스템 및 타 기종과도 네트워크 구성이 용이하고 수백여종의 과학기술 적용업무를 이용할 수 있다.

  • PDF

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

A Study on the Underwater Channel Model based on a High-Order Finite Difference Method using GPUs (그래픽 프로세서를 이용한 고차 유한 차분식 기반 수중채널모델 연구)

  • Bae, Ho Seuk;Kim, Won-Ki;Son, Su-Uk;Ha, Wansoo
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.11-20
    • /
    • 2021
  • As unmanned underwater systems have recently emerged, a high-speed underwater channel modeling technique, which is one of the most important techniques in the system, has received a lot of attention. In this paper, we proposed a high-speed sound propagation model and verified the applicability through quantitative performance analyses. We used a high-order finite difference method (FDM) for wave propagation modeling in the water, and a domain decomposition method was adopted using multiple general-purpose graphics processing units (GPUs) to increase the calculation efficiency. We compared the results of the model we proposed with the analytic solution in the half-infinite media and results of the Virtual Timeseries Experiment (VirTEX) model, which is based on the ray method. Finally, we analyzed the performance of the model quantitatively using numerical examples. Through quantitative analyses of the improvement in computational performance, we confirmed that the computational speed increases linearly as the number of GPUs increases. The computation times are increased by 2 times and 8 times, respectively, when the domain size of computation and the maximum frequency are doubled. We expect that the proposed high-speed underwater channel modeling technique is able to contribute to the enhancement of national defense as an underwater communication channel model and analysis tool to develop the underwater communication technique for the unmanned underwater system.

A Study on Design Schemes of Extracting Control Signals for a CD-G System (디지틀 오디오용 그래픽 시스템의 실시간 제어신호 추출을 위한 설계방식 연구)

  • 이용석;정화자;김용득
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.10
    • /
    • pp.1063-1073
    • /
    • 1992
  • This paper deals with a method for extracting picture signals from CD graphics with a conventional CD player, schemes for designing circuits for the effective extraction of control signals, and the implementation of such circuits using commercially available logic components, thereby achieving cost-effectiveness. This paper also presents an implementation and evaluation of the CD-G system, which requires extracting picture signals, deinterleaving the extracted signals and analyzing control commands and displaying them on a screen. The CD-G system implemented using the extraction circuit presented herein has been observed to operate well in real time.

  • PDF

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

The Design of VGE(Vector Geometric Engine) for 3D Graphics Geometry Processing (3차원 그래픽 지오메트리 연산을 위한 벡터 지오메트리 엔진의 설계)

  • 김원석;정철호;이길환;박우찬;한탁돈;이문기
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10c
    • /
    • pp.52-54
    • /
    • 2001
  • 3차원 그래픽 가속기는 지오메트리 처리(geometry processing)와 레스터라이져(rasterizer)로 구성된다. 본 논문에서는 지오메트리 처리들 고속으로 수행할 수 있는 벡터 형태의 처리 구조(VGE)를 제안하였다. 특허 기존의 부동소수점을 계산할 수 있는 구조에 4개의 FADD, FMUL, 128개의 벡터 레지스터를 추가하여 지오메트리 연산을 가속했으며 VGE와 비슷한 H/W 비용을 갖는 Hitachi의 SH4와 비교했을 때 평균 4.7배의 성능향상을 보였다. 또한 성능 평가를 위해 범용프로세서 시뮬레이터인 Simplescalar를 수정하여 시뮬레이터를 제작했으며 Viewperf Benchmark를 입력으로 사용하였다.

  • PDF