• Title/Summary/Keyword: General purpose computing

Search Result 160, Processing Time 0.031 seconds

Optimizing Instruction Prefetching to Improve Worst-Case Performance for Real-Time Applications

  • Ding, Yiqiang;Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.59-71
    • /
    • 2009
  • While the average-case performance is important for general-purpose applications, worst-case performance is crucial for real-time systems to ensure schedulability and reliability. Recent work has shown that simple prefetching techniques such as the Next-N-Line prefetching can benefit both average-case and worst-case performance; however, the improvement on the worstcase execution time (WCET) is rather limited and inefficient. This paper presents two instruction prefetching approaches that are specially designed to enhance the worst-case performance, including the loop-based prefetching and WCET-oriented prefetching. Our experiments indicate that both instruction prefetching techniques can achieve better worst-case execution cycles than the Next-N-Line prefetching while having various impacts on the average-case performance.

스마트폰에서의 영상처리를 위한 GPU 활용

  • Park, In-Gyu;Choe, Ho-Yeol
    • Information and Communications Magazine
    • /
    • v.29 no.4
    • /
    • pp.46-51
    • /
    • 2012
  • 본 기고에서는 최근 스마트폰에서 요구되는 다양한 멀티미디어 어플리케이션을 embedded GPU(Graphics Processing Unit)를 이용하여 고속 병렬처리하기 위한 GPGPU (General-Purpose Computing on GPU) 기술 및 영상처리 분야의 응용 사례를 소개한다. 일반적인 데스크탑 컴퓨팅 환경과 달리 제약사항이 많은 embedded 환경에서의 GPGPU 응용 기술은 아직 초기단계이다. 그러나 급격히 발전하는 embedded GPU IP와 OpenCL과 같은 API의 등장으로 embedded GPU를 이용한 고속 병렬처리 환경이 수 년 이내에 일반화 될 것이다. 본 기고에서는 그 가능성을 점검하기 위하여 embedded GPU에서의 영상처리를 위한 최신 하드웨어와 소프트웨어 환경의 발전 동향을 소개한다. 더불어 최신 스마트폰에서의 GPGPU기술을 사용한 영상처리 사례와 영상처리 알고리즘의 GPGPU 알고리즘 구현시 고려해야 할 주요 사항을 정리한다.

Accelerating Fingerprint Enhancement Algorithm on GPGPU using OpenCL (OpenCL을 이용한 GPGPU 기반 지문개선 알고리즘 가속화)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.4
    • /
    • pp.666-672
    • /
    • 2016
  • Recently the fingerprint is widely used as one of biometrics to improve the security of financial mobile applications, because of its user convenience and high recognition rate. However, in order to apply fingerprint algorithms to finance and security applications, the recognition rate and processing speed of the fingerprint algorithms have to be improved further. In this paper, we propose the parallel fingerprint enhancement algorithm on general-purpose computing on graphics processing unit (GPGPU) using OpenCL. We discuss the analysis of the parallelism in the fingerprint algorithm as well as the exploration of optimization parameters of the parallel fingerprint algorithm to improve the performance. The experimental results showed that the execution of parallel fingerprint enhancement algorithm on GPGPUs was accelerated from 29.4 upto 69.2 times compared with the execution of the original one on the host CPUs.

Comparing machine fault diagnosis performances on current, vibration and flux based smart sensors (전류, 진동 및 자속센서기반 스마트센서를 이용한 기계결함진단 성능비교)

  • Son, Jong-Duk;Tae, Sung-Do;Yang, Bo-Suk;Hwang, Don-Ha;Kang, Dong-Sik
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2008.04a
    • /
    • pp.809-816
    • /
    • 2008
  • With increasing demands for reducing cost of maintenance which can detect machine fault automatically; low cost and intelligent functionality sensors are required. Rapid developments, in semiconductor, computing, and communication have led to a new generation of sensor called "smart" sensors with functionality and intelligence. The purpose of this research is comparison of machine fault classification between general analyzer signals and smart sensor signals. Three types of sensors are used in induction motors faults diagnosis, which are vibration, current and flux. Classification results are satisfied.

  • PDF

Cache Coherence Protocols in NUMA Multiprocessors (NUMA 다중 프로세서에서의 캐쉬 일관성 프로토콜)

  • Moh, Sang-Man;Hahn, Woo-Jong;Yoon, Suk-Han
    • Electronics and Telecommunications Trends
    • /
    • v.13 no.5 s.53
    • /
    • pp.11-22
    • /
    • 1998
  • Recently, scalable multiprocessor systems are actively developed for general-purpose computing, which are based on distributed shared memory (DSM) architecture to boost up both programmability and scalability. In this paper, we survey and analyze cache coherence protocols in non-uniform memory access (NUMA) multiprocessor systems. In particular, it has been easily inferred that specialized hardware suitable for NUMA multiprocessor systems with commodity symmetric multiprocessors (SMPs) is highly required. The cache coherence protocol combined with specialized hardware can significantly improve the performance and scalability of NUMA multiprocessor systems, providing better programmability.

An Image Processing Speed Enhancement in a Multi-Frame Super Resolution Algorithm by a CUDA Method (CUDA를 이용한 초해상도 기법의 영상처리 속도개선 방법)

  • Kim, Mi-Jeong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.14 no.4
    • /
    • pp.663-668
    • /
    • 2011
  • Although multi-frame super resolution algorithm has many merits but it demands too much calculation time. Researches have shown that image processing time can be reduced using a CUDA(Compute unified device architecture) which is one of GPGPU(General purpose computing on graphics processing unit) models. In this paper, we show that the processing time of multi-frame super resolution algorithm can be reduced by employing the CUDA. It was applied not to the whole parts but to the largest time consuming parts of the program. The simulation result shows that using a CUDA can reduce an operation time dramatically. Therefore it can be possible that multi-frame super resolution algorithm is implemented in real time by using libraries of image processing algorithms which are made by a CUDA.

Development of a Fatigue Analysis Software System (피로해석시스템 개발)

  • Choi, B.I.;Lee, H.J.;Han, S.W.;Kim, J.Y.;Hwang, K.H.;Kang, J.Y.
    • Proceedings of the KSME Conference
    • /
    • 2001.06a
    • /
    • pp.120-125
    • /
    • 2001
  • A general purpose fatigue analysis software to predict fatigue lives of mechanical components and structures was developed. This software has some characteristic features including functions of searching weak regions on the free surface in order to reduce computing time significantly, a database of fatigue properties for various materials. and an expert system which can assist any users to get more proper results. This software can be used in the environment consists of commercial finite element packages. Using the software developed fatigue analyses for a SAE keyhole specimen and an automobile knuckle were carried out. It was observed that the results were agree well with those from commercial packages.

  • PDF

The effects of online learning situation and learners' learning style on satisfaction in Blended Learning (온라인 학습상황과 학습자의 학습스타일이 블랜디드 러닝 만족도에 미치는 영향)

  • Lee, Sung-Ju;Kwon, Jae-Hwan
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.95-103
    • /
    • 2011
  • This study was executed to give a help in planning and implementing Blended learning through investigating the learners' satisfaction difference according to Blended learning situation and learners' trait. For this purpose this study divided online learning situation into three types to examine the influence on satisfaction. And participants was divided based on the learning style to examine the influence of the trait on satisfaction. The Blended learning satisfaction classified into four; web environment, content, face to face sessions, general view on Blended learning's implementation.

Study of Cache Performance on GPGPU

  • Choi, Kyu Hyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.78-82
    • /
    • 2015
  • General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.

Accelerating Molecular Dynamics Simulation Using Graphics Processing Unit

  • Myung, Hun-Joo;Sakamaki, Ryuji;Oh, Kwang-Jin;Narumi, Tetsu;Yasuoka, Kenji;Lee, Sik
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.12
    • /
    • pp.3639-3643
    • /
    • 2010
  • We have developed CUDA-enabled version of a general purpose molecular dynamics simulation code for GPU. Implementation details including parallelization scheme and performance optimization are described. Here we have focused on the non-bonded force calculation because it is most time consuming part in molecular dynamics simulation. Timing results using CUDA-enabled and CPU versions were obtained and compared for a biomolecular system containing 23558 atoms. CUDA-enabled versions were found to be faster than CPU version. This suggests that GPU could be a useful hardware for molecular dynamics simulation.