• 제목/요약/키워드: General purpose computing

검색결과 160건 처리시간 0.023초

Optimizing Instruction Prefetching to Improve Worst-Case Performance for Real-Time Applications

  • Ding, Yiqiang;Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제3권1호
    • /
    • pp.59-71
    • /
    • 2009
  • While the average-case performance is important for general-purpose applications, worst-case performance is crucial for real-time systems to ensure schedulability and reliability. Recent work has shown that simple prefetching techniques such as the Next-N-Line prefetching can benefit both average-case and worst-case performance; however, the improvement on the worstcase execution time (WCET) is rather limited and inefficient. This paper presents two instruction prefetching approaches that are specially designed to enhance the worst-case performance, including the loop-based prefetching and WCET-oriented prefetching. Our experiments indicate that both instruction prefetching techniques can achieve better worst-case execution cycles than the Next-N-Line prefetching while having various impacts on the average-case performance.

스마트폰에서의 영상처리를 위한 GPU 활용

  • 박인규;최호열
    • 정보와 통신
    • /
    • 제29권4호
    • /
    • pp.46-51
    • /
    • 2012
  • 본 기고에서는 최근 스마트폰에서 요구되는 다양한 멀티미디어 어플리케이션을 embedded GPU(Graphics Processing Unit)를 이용하여 고속 병렬처리하기 위한 GPGPU (General-Purpose Computing on GPU) 기술 및 영상처리 분야의 응용 사례를 소개한다. 일반적인 데스크탑 컴퓨팅 환경과 달리 제약사항이 많은 embedded 환경에서의 GPGPU 응용 기술은 아직 초기단계이다. 그러나 급격히 발전하는 embedded GPU IP와 OpenCL과 같은 API의 등장으로 embedded GPU를 이용한 고속 병렬처리 환경이 수 년 이내에 일반화 될 것이다. 본 기고에서는 그 가능성을 점검하기 위하여 embedded GPU에서의 영상처리를 위한 최신 하드웨어와 소프트웨어 환경의 발전 동향을 소개한다. 더불어 최신 스마트폰에서의 GPGPU기술을 사용한 영상처리 사례와 영상처리 알고리즘의 GPGPU 알고리즘 구현시 고려해야 할 주요 사항을 정리한다.

OpenCL을 이용한 GPGPU 기반 지문개선 알고리즘 가속화 (Accelerating Fingerprint Enhancement Algorithm on GPGPU using OpenCL)

  • 김대희;박능수
    • 전기학회논문지
    • /
    • 제65권4호
    • /
    • pp.666-672
    • /
    • 2016
  • Recently the fingerprint is widely used as one of biometrics to improve the security of financial mobile applications, because of its user convenience and high recognition rate. However, in order to apply fingerprint algorithms to finance and security applications, the recognition rate and processing speed of the fingerprint algorithms have to be improved further. In this paper, we propose the parallel fingerprint enhancement algorithm on general-purpose computing on graphics processing unit (GPGPU) using OpenCL. We discuss the analysis of the parallelism in the fingerprint algorithm as well as the exploration of optimization parameters of the parallel fingerprint algorithm to improve the performance. The experimental results showed that the execution of parallel fingerprint enhancement algorithm on GPGPUs was accelerated from 29.4 upto 69.2 times compared with the execution of the original one on the host CPUs.

전류, 진동 및 자속센서기반 스마트센서를 이용한 기계결함진단 성능비교 (Comparing machine fault diagnosis performances on current, vibration and flux based smart sensors)

  • 손종덕;태성도;양보석;황돈하;강동식
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2008년도 춘계학술대회논문집
    • /
    • pp.809-816
    • /
    • 2008
  • With increasing demands for reducing cost of maintenance which can detect machine fault automatically; low cost and intelligent functionality sensors are required. Rapid developments, in semiconductor, computing, and communication have led to a new generation of sensor called "smart" sensors with functionality and intelligence. The purpose of this research is comparison of machine fault classification between general analyzer signals and smart sensor signals. Three types of sensors are used in induction motors faults diagnosis, which are vibration, current and flux. Classification results are satisfied.

  • PDF

NUMA 다중 프로세서에서의 캐쉬 일관성 프로토콜 (Cache Coherence Protocols in NUMA Multiprocessors)

  • 모상만;한우종;윤석한
    • 전자통신동향분석
    • /
    • 제13권5호통권53호
    • /
    • pp.11-22
    • /
    • 1998
  • Recently, scalable multiprocessor systems are actively developed for general-purpose computing, which are based on distributed shared memory (DSM) architecture to boost up both programmability and scalability. In this paper, we survey and analyze cache coherence protocols in non-uniform memory access (NUMA) multiprocessor systems. In particular, it has been easily inferred that specialized hardware suitable for NUMA multiprocessor systems with commodity symmetric multiprocessors (SMPs) is highly required. The cache coherence protocol combined with specialized hardware can significantly improve the performance and scalability of NUMA multiprocessor systems, providing better programmability.

CUDA를 이용한 초해상도 기법의 영상처리 속도개선 방법 (An Image Processing Speed Enhancement in a Multi-Frame Super Resolution Algorithm by a CUDA Method)

  • 김미정
    • 한국군사과학기술학회지
    • /
    • 제14권4호
    • /
    • pp.663-668
    • /
    • 2011
  • Although multi-frame super resolution algorithm has many merits but it demands too much calculation time. Researches have shown that image processing time can be reduced using a CUDA(Compute unified device architecture) which is one of GPGPU(General purpose computing on graphics processing unit) models. In this paper, we show that the processing time of multi-frame super resolution algorithm can be reduced by employing the CUDA. It was applied not to the whole parts but to the largest time consuming parts of the program. The simulation result shows that using a CUDA can reduce an operation time dramatically. Therefore it can be possible that multi-frame super resolution algorithm is implemented in real time by using libraries of image processing algorithms which are made by a CUDA.

피로해석시스템 개발 (Development of a Fatigue Analysis Software System)

  • 최병익;이학주;한승우;김정엽;황기현;강재윤
    • 대한기계학회:학술대회논문집
    • /
    • 대한기계학회 2001년도 춘계학술대회논문집A
    • /
    • pp.120-125
    • /
    • 2001
  • A general purpose fatigue analysis software to predict fatigue lives of mechanical components and structures was developed. This software has some characteristic features including functions of searching weak regions on the free surface in order to reduce computing time significantly, a database of fatigue properties for various materials. and an expert system which can assist any users to get more proper results. This software can be used in the environment consists of commercial finite element packages. Using the software developed fatigue analyses for a SAE keyhole specimen and an automobile knuckle were carried out. It was observed that the results were agree well with those from commercial packages.

  • PDF

온라인 학습상황과 학습자의 학습스타일이 블랜디드 러닝 만족도에 미치는 영향 (The effects of online learning situation and learners' learning style on satisfaction in Blended Learning)

  • 이성주;권재환
    • 인터넷정보학회논문지
    • /
    • 제12권6호
    • /
    • pp.95-103
    • /
    • 2011
  • 본 연구는 블랜디드 러닝 상황과 학습자의 특성에 따른 학습자의 만족도 차이를 살펴보아 블랜디드 러닝 실제에 도움을 주고자 실시되었다. 이를 위해 블랜디드 러닝의 온라인 학습상황을 세 가지 유형으로 분류하여 블랜디드 러닝 만족도에 미치는 영향을 살펴보았다. 또한 블랜디드 러닝 참여자들의 학습스타일을 유형별로 분류하여 그 특성이 참여자의 만족도에 미치는 영향을 살펴보았다. 또한 블랜디드 러닝 만족도는 크게 웹환경 만족도, 콘텐츠 만족도, 면대면수업 만족도, 일반적 만족도의 네 가지로 나누어 살펴보았다.

Study of Cache Performance on GPGPU

  • Choi, Kyu Hyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.78-82
    • /
    • 2015
  • General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.

Accelerating Molecular Dynamics Simulation Using Graphics Processing Unit

  • Myung, Hun-Joo;Sakamaki, Ryuji;Oh, Kwang-Jin;Narumi, Tetsu;Yasuoka, Kenji;Lee, Sik
    • Bulletin of the Korean Chemical Society
    • /
    • 제31권12호
    • /
    • pp.3639-3643
    • /
    • 2010
  • We have developed CUDA-enabled version of a general purpose molecular dynamics simulation code for GPU. Implementation details including parallelization scheme and performance optimization are described. Here we have focused on the non-bonded force calculation because it is most time consuming part in molecular dynamics simulation. Timing results using CUDA-enabled and CPU versions were obtained and compared for a biomolecular system containing 23558 atoms. CUDA-enabled versions were found to be faster than CPU version. This suggests that GPU could be a useful hardware for molecular dynamics simulation.