• Title/Summary/Keyword: 그래픽 처리 장치

Search Result 121, Processing Time 0.035 seconds

Efficient Parallel Bilateral Filter using GPGPU (GPGPU 를 이용한 양 방향성 필터의 병렬 구현 및 성능 평가)

  • Chang, Ki Joon;Ro, Won Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.369-372
    • /
    • 2011
  • 양 방향성 필터는 이미지표면 평탄화와 잡음제거에 좋은 성능을 보이지만 특유의 연산 복잡도로 인하여 연산 시간이 오래 걸린다는 단점이 존재한다. 따라서 본 논문에서는 고도의 병렬수행을 바탕으로 하는 그래픽연산장치(GPU)에 적합하도록 수정된 효율적인 양 방향성 필터를 NVIDIA 의 CUDA 를 사용하여 GTX 285 GPU 에서 구현하였다. 영상의 전 영역을 참조하는 대신 인접하고 연속된 영역으로의 근사화, 적은 메모리 사용량, 빠른 접근속도를 가지며 충돌이 최소화된 공유메모리 버퍼, Warp 를 고려한 병합된 메모리 접근방법을 바탕으로 병렬화 하였다. 그 결과, 같은 방식의 순차실행 알고리즘 대비 최소 약 34 배에서 최대 약 76 배의 속도 개선과 30dB 내외의 PSNR 을 갖는 양 방향성 필터를 구현할 수 있었다.

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

  • Moon, Min Goo;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.2
    • /
    • pp.39-49
    • /
    • 2019
  • Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.

A Study on Design Schemes of Extracting Control Signals for a CD-G System (디지틀 오디오용 그래픽 시스템의 실시간 제어신호 추출을 위한 설계방식 연구)

  • 이용석;정화자;김용득
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.10
    • /
    • pp.1063-1073
    • /
    • 1992
  • This paper deals with a method for extracting picture signals from CD graphics with a conventional CD player, schemes for designing circuits for the effective extraction of control signals, and the implementation of such circuits using commercially available logic components, thereby achieving cost-effectiveness. This paper also presents an implementation and evaluation of the CD-G system, which requires extracting picture signals, deinterleaving the extracted signals and analyzing control commands and displaying them on a screen. The CD-G system implemented using the extraction circuit presented herein has been observed to operate well in real time.

  • PDF

Parallel Processing of Satellite Images using CUDA Library: Focused on NDVI Calculation (CUDA 라이브러리를 이용한 위성영상 병렬처리 : NDVI 연산을 중심으로)

  • LEE, Kang-Hun;JO, Myung-Hee;LEE, Won-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.3
    • /
    • pp.29-42
    • /
    • 2016
  • Remote sensing allows acquisition of information across a large area without contacting objects, and has thus been rapidly developed by application to different areas. Thus, with the development of remote sensing, satellites are able to rapidly advance in terms of their image resolution. As a result, satellites that use remote sensing have been applied to conduct research across many areas of the world. However, while research on remote sensing is being implemented across various areas, research on data processing is presently insufficient; that is, as satellite resources are further developed, data processing continues to lag behind. Accordingly, this paper discusses plans to maximize the performance of satellite image processing by utilizing the CUDA(Compute Unified Device Architecture) Library of NVIDIA, a parallel processing technique. The discussion in this paper proceeds as follows. First, standard KOMPSAT(Korea Multi-Purpose Satellite) images of various sizes are subdivided into five types. NDVI(Normalized Difference Vegetation Index) is implemented to the subdivided images. Next, ArcMap and the two techniques, each based on CPU or GPU, are used to implement NDVI. The histograms of each image are then compared after each implementation to analyze the different processing speeds when using CPU and GPU. The results indicate that both the CPU version and GPU version images are equal with the ArcMap images, and after the histogram comparison, the NDVI code was correctly implemented. In terms of the processing speed, GPU showed 5 times faster results than CPU. Accordingly, this research shows that a parallel processing technique using CUDA Library can enhance the data processing speed of satellites images, and that this data processing benefits from multiple advanced remote sensing techniques as compared to a simple pixel computation like NDVI.

Development of the Virtual Mouse on a Projector Screen using a Laser Pointer (프로젝터 화면상에서 레이저 포인터를 이용한 마우스 기능 구현에 관한 연구)

  • Kim, Ju-Kuk;Kim, Sang-Jun;Yee, Ki-Won;Huh, Heon;Yee, Yang-Hee;Chang, Hong-Soon
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.123-124
    • /
    • 2011
  • 반도체 기술의 비약적 발전에 힘입어 현재의 개인용 컴퓨터는 고성능 CPU를 탑재하고 1990대의 텍스트 기반의 운영체제에서 벗어나 그래픽 기반의 운영체제에서 다양한 멀티미디어 기능을 제공 한다. 이를 위한 입력장치로 텍스트 기반 운영체제에서 주로 사용된 키보드뿐 아니라 마우스, 카메라, 터치스크린 등의 다양한 장치들이 사용되고 있다. 그러나 빔 프로젝터를 이용한 프레젠테이션의 경우 아직도 레이저포인터를 이용한 발표가 일반적이며 발표자와 빔 프로젝트용 PC와의 인터랙션이 없기 때문에 다양한 멀티미디어 기능 구현이 제한적이다. 본 논문에서는 USB 웹 카메라를 이용하여 프로젝터 화면을 촬영한 후 영상처리 라이브러리인 OpenCV를 기반으로 레이저 포인터의 위치와 동작을 검출하여 원거리에서도 사용자가 레이저 포인터를 이용하여 마우스 동작을 재현할 수 있는 시스템을 개발하고자 한다. 이를 활용하면 레이저 포인터를 사용하여 발표자가 별도의 입력장치 없이 PC와의 인터랙션이 가능해져서 다양한 멀티미디어 기반의 프레젠테이션이 가능해진다.

  • PDF

Implementation of Neural Networks using GPU (GPU를 이용한 신경망 구현)

  • Oh Kyoung-su;Jung Keechul
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.735-742
    • /
    • 2004
  • We present a new use of common graphics hardware to perform a faster artificial neural network. And we examine the use of GPU enhances the time performance of the image processing system using neural network, In the case of parallel computation of multiple input sets, the vector-matrix products become matrix-matrix multiplications. As a result, we can fully utilize the parallelism of GPU. Sigmoid operation and bias term addition are also implemented using pixel shader on GPU. Our preliminary result shows a performance enhancement of about thirty times faster using ATI RADEON 9800 XT board.

A Study on Integrated Processing System for Finite Element Structural Analysis (유한요소 구조해석을 위한 전후처리 통합운영 시스템에 관한 연구)

  • 서진국;송준엽;신영식
    • Computational Structural Engineering
    • /
    • v.8 no.1
    • /
    • pp.161-172
    • /
    • 1995
  • An Integrated processing system for finite element structural analysis has been studied. It is designed to control integratedly the preprocessing, the execution and the postprocessing of a finite element structural analysis program on Windows. It becomes a better graphic user interface(GUI) for the concurrent representation of various inputs and outputs through the dialog-type on multi-windows by the multi-tasking and the object linking and embedding(OLE). Data input can be done easily through menus, dialog boxes and automatic stepwise inputs on the multiple windows, and then output results can be seen with input data on the same screen. Efficiency and validity of the system were examined by solving several numerical examples.

  • PDF

Implementation and Performance Evaluation of a Video-Equipped Real-Time Fire Detection Method at Different Resolutions using a GPU (GPU를 이용한 다양한 해상도의 비디오기반 실시간 화재감지 방법 구현 및 성능평가)

  • Shon, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2015
  • In this paper, we propose an efficient parallel implementation method of a widely used complex four-stage fire detection algorithm using a graphics processing unit (GPU) to improve the performance of the algorithm and analyze the performance of the parallel implementation method. In addition, we use seven different resolution videos (QVGA, VGA, SVGA, XGA, SXGA+, UXGA, QXGA) as inputs of the four-stage fire detection algorithm. Moreover, we compare the performance of the GPU-based approach with that of the CPU implementation for each different resolution video. Experimental results using five different fire videos with seven different resolutions indicate that the execution time of the proposed GPU implementation outperforms that of the CPU implementation in terms of execution time and takes a 25.11ms per frame for the UXGA resolution video, satisfying real-time processing (30 frames per second, 30fps) of the fire detection algorithm.

Analyzing Fine-Grained Resource Utilization for Efficient GPU Workload Allocation (GPU 작업 배치의 효율화를 위한 자원 이용률 상세 분석)

  • Park, Yunjoo;Shin, Donghee;Cho, Kyungwoon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.111-116
    • /
    • 2019
  • Recently, GPU expands application domains from graphic processing to various kinds of parallel workloads. However, current GPU systems focus on the maximization of each workload's parallelism through simplified control rather than considering various workload characteristics. This paper classifies the resource usage characteristics of GPU workloads into computing-bound, memory-bound, and dependency-latency-bound, and quantifies the fine-grained bottleneck for efficient workload allocation. For example, we identify the exact bottleneck resources such as single function unit, double function unit, or special function unit even for the same computing-bound workloads. Our analysis implies that workloads can be allocated together if fine-grained bottleneck resources are different even for the same computing-bound workloads, which can eventually contribute to efficient workload allocation in GPU.

Introduction to general purpose GPU computing (GPU를 이용한 범용 계산의 소개)

  • Yu, Donghyeon;Lim, Johan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1043-1061
    • /
    • 2013
  • Recent advances in computer technology introduce massive data and their analysis becomes important. The high performance computing is one of the most essential part in analysis of massive data. In this paper, we review the general purpose of the graphics processing unit and its application to parallel computing, which has been of great interest in statistics communities.