• Title/Summary/Keyword: Computer CPU

Search Result 758, Processing Time 0.037 seconds

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

Comparative Analysis of Computation Times Based on the Number of Containers for CPU-Intensive Tasks in the Kubeflow Environment (Kubeflow 환경에서 CPU 집약적인 작업을 위한 컨테이너 수에 따른 연산 시간 비교 및 분석)

  • HyunSeung Jung;Taeshin Kang;Heonchang Yu;Jihun Kang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.93-96
    • /
    • 2023
  • 머신 러닝의 수요가 증가함에 따라, 머신 러닝 워크플로우의 배포 수요도 증가했다. Kubeflow를 통해 머신 러닝 배포를 편리하게 할 수 있으며, Kubeflow Pipelines에서는 하나의 작업을 여러 컨테이너로 분산시켜서 연산하는 것이 가능하다. 하지만 컨테이너 수를 많이 늘릴수록 반드시 성능이 향상되는 것은 아니다. 따라서, 본 연구에서는 성능 향상의 한계를 제공하는 원인을 분석하기 위해서, Kubeflow에서 CPU 집약적인 작업을 여러 컨테이너로 분산시켜서 연산을 수행하였다. 컨테이너 수에 따른 연산 완료 시간을 비교 및 분석한 결과, 컨테이너 수가 증가할수록 연산 속도 향상이 빨라지나, 어느 시점을 지나면 속도가 다시 완만하게 줄어드는 현상을 확인하였다. 이는 리소스 제한으로 인해 모든 컨테이너가 동시에 스케줄링 되지 못한 것이 가장 큰 원인으로 분석하였다.

Analysis of TCP/IP Protocol for Implementing a High-Performance Hybrid TCP/IP Offload Engine (고성능 Hybrid TCP/IP Offload Engine 구현을 위한 TCP/IP 프로토콜 분석)

  • Jang Hankook;Oh Soo-Cheol;Chung Sang-Hwa;Kim Dong Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.6
    • /
    • pp.296-305
    • /
    • 2005
  • TCP/IP, the most popular communication protocol, is processed on a host CPU in traditional computer systems and this imposes enormous loads on the host CPU. Recently TCP/IP Offload Engine (TOE) technology, which processes TCP/IP on a network adapter instead of the host CPU, becomes an important way to solve the problem. In this paper we analysed the structure of a TCP/IP protocol stack in the Linux operating system and important factors, which cause a lot of loads on the host CPU, by measuring the time spent on processing each function in the protocol stack. Based on these analyses, we propose a Hybrid TOE architecture, in which functions imposing much loads on the host CPU are implemented using hardware and other functions are implemented using software.

A Primary Study on the Enhancement of Efficiency in the Computer Cooling System using Entrance Tube of Outer Air (외부공기 유입관을 이용한 컴퓨터 냉각시스템의 효율향상에 관한 연구)

  • Kim, S.H.;Kim, M.H.
    • Journal of Power System Engineering
    • /
    • v.13 no.4
    • /
    • pp.56-61
    • /
    • 2009
  • In recent years, since the continuing increase in the capacity in personal computer such as the optimal performance, high quality and high resolution image, the computer system's components produce large amounts of heat during operation. This study analyzes and investigates the ability and efficiency of a cooling system inside a computer by means of central processing unit (CPU) and power supply cooling fan. This research was conducted to enhancement of efficiency of the cooling system inside the computer by making a structure which produces different air pressures in an air inflow tube. Consequently, when temperatures of the CPU and room inside computer were compared with a general personal computer, temperatures of the tested CPU, the room and the heat sink were as low as $5^{\circ}C$, $2.5^{\circ}C$ and $7^{\circ}C$ respectively. In addition to, revolution per minute (RPM) was shown as low as 250 after 1 hour operation. This research explored the possibility of enhancing the effective cooling of high-performance computer systems.

  • PDF

Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

  • BAEK, Aram;LEE, Kangwoon;KIM, Jae-Gon;CHOI, Haechul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4948-4967
    • /
    • 2017
  • On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.

Development of Stand-alone Image Processing Module on ARM CPU Employing Linux OS. (리눅스 OS를 이용한 ARM CPU 기반 독립형 영상처리모듈 개발)

  • Lee, Seok;Moon, Seung-Bin
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.2
    • /
    • pp.38-44
    • /
    • 2003
  • This paper describes the development of stand-alone image processing module on Strong Arm CPU employing an embedded Linux. Stand-alone image Processing module performs various functions such as thresholding, edge detection, and image enhancement of a raw image data in real time. The comparison of execution time between similar PC and developed module shows the satisfactory results. This Paper provides the possibility of applying embedded Linux successfully in industrial devices.

Power Management Mechanism for Interactive Applications in Wireless Network Systems (무선 시스템 환경에서 대화형 응용을 위한 전력제어기법)

  • Min, Jung-Hi;Cha, Ho-Jung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10a
    • /
    • pp.185-188
    • /
    • 2006
  • 본 논문은 모바일 무선 시스템의 가용시간을 늘리고자 최근 사용량이 증가하고 있는 웹 응용으로 대표되는 대화형 응용을 사용할 때 시스템의 에너지를 효율적으로 줄일 수 있는 통합 전력 제어 기법을 제시한다. 기존의 방법들은 CPU와 WNIC의 소모 에너지 절감을 위하여 상호간에 영향이 없다는 가정하에 CPU와 WNIC에 대하여 각각의 정책을 수립하였다. 하지만 제시되는 매커니즘은 대화형 응용을 처리할 때 WNIC에서 얻을 수 있는 정보들을 CPU의 전압과 주파수 조절에 사용함으로써 시스템 레벨의 에너지 소모를 효율적으로 줄일 수 있다. 실험결과는 제시되는 매커니즘에 의해 기존의 CPU와 WNIC의 모드를 별개로 제어한 방법에 비해 평균 46%, 최대 62%의 소모 에너지 절감 효과를 보였다.

  • PDF

How to optimize WebUI installation and CPU utilization on low-end AMD graphics cards (저사양 AMD 그래픽 카드 환경 하 WebUI 설치 및 CPU 활용 정상 작동 최적화 방법)

  • Kang-Sub Kim;Kang-Hee Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.43-44
    • /
    • 2023
  • 이미지 생성 인공지능 모델을 활용한 사진, 삽화 등의 이미지 생성에서 WebUI는 저사양 AMD 그래픽 카드용 설치파일과 사용 방법을 제공하고 있다. 이 논문은 CUDA toolkit이 작동하지 않는 컴퓨터에서 CPU를 활용하여 사용할 수 있도록 한 것이다. 이는 학생들이나 개인 연구자들에게 좋은 기회를 제공하고 있다고 생각한다. 설치 과정이 복잡할 수는 있으나, WebUI에서 구동하는 다양한 이미지 모델을 시험하는 용도로 유용하다.

  • PDF

A CPU and GPU Heterogeneous Computing Techniques for Fast Representation of Thin Features in Liquid Simulations (액체 시뮬레이션의 얇은 특징을 빠르게 표현하기 위한 CPU와 GPU 이기종 컴퓨팅 기술)

  • Kim, Jong-Hyun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.24 no.2
    • /
    • pp.11-20
    • /
    • 2018
  • We propose a new method particle-based method that explicitly preserves thin liquid sheets for animating liquids on CPU-GPU heterogeneous computing framework. Our primary contribution is a particle-based framework that splits at thin points and collapses at dense points to prevent the breakup of liquid on GPU. In contrast to existing surface tracking methods, the our method does not suffer from numerical diffusion or tangles, and robustly handles topology changes on CPU-GPU framework. The thin features are detected by examining stretches of distributions of neighboring particles by performing PCA(Principle component analysis), which is used to reconstruct thin surfaces with anisotropic kernels. The efficiency of the candidate position extraction process to calculate the position of the fluid particle was rapidly improved based on the CPU-GPU heterogeneous computing techniques. Proposed algorithm is intuitively implemented, easy to parallelize and capable of producing quickly detailed thin liquid animations.

Worst Case Timing Analysis for DMA I/O Requests in Real-time Systems (실시간 시스템의 DMA I/O 요구를 위한 최악 시간 분석)

  • Hahn Joosun;Ha Rhan;Min Sang Lyul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.4
    • /
    • pp.148-159
    • /
    • 2005
  • We propose a technique for finding the worst case response time (WCRT) of a DMA request that is needed in the schedulability analysis of a whole real-time system. The technique consists of three steps. In the first step, we find the worst case bus usage pattern of each CPU task. Then in the second step, we combine the worst case bus usage pattern of CPU tasks to construct the worst case bus usage pattern of the CPU. This second step considers not only the bus requests made by CPU tasks individually but also those due to preemptions among the CPU tasks. finally, in the third step, we use the worst case bus usage pattern of the CPU to derive the WCRT of DMA requests assuming the fixed-priority bus arbitration protocol. Experimental results show that overestimation of the DMA response time by the proposed technique is within $20\%$ for most DMA request sizes and that the percentage overestimation decreases as the DMA request size increases.