• Title/Summary/Keyword: floating-point arithmetic

Search Result 66, Processing Time 0.022 seconds

Time-optimized Color Conversion based on Multi-mode Chrominance Reconstruction and Operation Rearrangement for JPEG Image Decoding (JPEG 영상 복원을 위한 다중 모드 채도 복원과 연산 재배열 기반의 시간 최적화된 컬러 변환)

  • Kim, Young-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.135-143
    • /
    • 2009
  • Recently, in the mobile device, the increase of the need for encoding and decoding of high-resolution images requires an efficient implementation of the image codec. This paper proposes a time-optimized color conversion method for the JPEG decoder, which reduces the number of calculations in the color conversion by the rearrangement of arithmetic operations being possible due to the linearity of the IDCT and the color conversion matrices and brings down the time complexity of the color conversion itself by the integer mapping replacing floating-point operations to the optimal fixed-point shift and addition operations, eventually reducing the time complexity of the JPEG decoder. And the proposed method compensates a decline of image quality incurred by the quantification error of the operation arrangement and the integer mapping by using the multi-mode chrominance reconstruction. The performance evaluation performed on the development platform of embedded systems showed that, compared to previous color conversion methods, the proposed method greatly reduces the image decoding time, minimizing the distortion of decoded images.

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

  • Ma, Sang-Baek
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.535-542
    • /
    • 1995
  • In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.

  • PDF

Hardware Design of Arccosine Function for Mobile Vector Graphics Processor (모바일 벡터 그래픽 프로세서용 역코사인 함수의 하드웨어 설계)

  • Choi, Byeong-Yoon;Lee, Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.727-736
    • /
    • 2009
  • In this paper, the $arccos(cos^{-1})$ arithmetic unit for mobile graphics accelerator is designed. The mobile vector graphics applications need tight area, execution time, power dissipation, and accuracy constraints compared to desktop PC applications. The designed processor adopts 2nd-order polynomial approximation scheme based on IEEE floating point data format to satisfy speed and accuracy conditions and reduces area via hardware sharing structure. The arccosine processor consists of 15,280 gates and its estimated operating frequency is about 125Mhz at operating condition of $0.35{\mu}m$ CMOS technology. Because the processor can execute arccosine function within 7 clock cycles, it has about 17 MOPS(million arccos operations per second) execution rate and can be applicable to mobile OpenVG processor. And because of its flexible architecture, it can be applicable to the various transcendental functions such as exponential, trigonometric and logarithmic functions via replacement of ROM and minor hardware modification.

A Study on High Speed Image Rotation Algorithm using CUDA (CUDA를 이용한 고속 영상 회전 알고리즘에 관한 연구)

  • Kwon, Hee-Choul;Cho, Hyung-Jin;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. It use just 2 shear translation operations, so it is very fast. In addition, we apply a parallel computing technique with CUDA. CUDA is a massively parallel computing architecture using prevailed GPU recently. As GPU is a dedicated graphic processor, it is exellent for parallel processing of pixels. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.

Development of a Powertrain for 20kW Experimental Electric Vehicle Using Surface Mounted Permanent Magnet Synchronous Motor (표면 부착형 영구자석 동기 전동기를 이용한 20kW급 실험용 전기자동차 파워트레인 개발)

  • Park, Sung-Hwan;Lee, Jeong-Ju;Son, Jong-Yull;Lee, Young-Il
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.22 no.3
    • /
    • pp.240-248
    • /
    • 2017
  • This paper describes the development of a powertrain for a 20 kW experimental electric vehicle using a surface-mounted permanent magnet synchronous motor (SPMSM) and its application to a test vehicle. Two 10 kW SPMSMs are used in the powertrain, and two-level inverters are developed by using IGBTs to derive these motors. To control the SPMSM, a control board based on a TMS320F28335 DSP module, which has fast arithmetic function and floating point operator, is used. We develop a 100 V/40 A battery pack, which includes $32{\times}4$ LiFePO4 battery cells using commercial BMS. A commercial on-board charger with 220 V (AC) input and 100 V (DC) and 18 A output is used to charge the battery pack. The performance of the developed vehicle, such as acceleration availability, maximum speed, and maximum power, is estimated based on vehicle dynamics and verified through experiments.

A Parallel Processing Technique for Large Spatial Data (대용량 공간 데이터를 위한 병렬 처리 기법)

  • Park, Seunghyun;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • Graphical processing unit (GPU) contains many arithmetic logic units (ALUs). Because many ALUs can be exploited to process parallel processing, GPU provides efficient data processing. The spatial data require many geographic coordinates to represent the shape of them in a map. The coordinates are usually stored as geodetic longitude and latitude. To display a map in 2-dimensional Cartesian coordinate system, the geodetic longitude and latitude should be converted to the Universal Transverse Mercator (UTM) coordinate system. The conversion to the other coordinate system and the rendering process to represent the converted coordinates to screen use complex floating-point computations. In this paper, we propose a parallel processing technique that processes the conversion and the rendering using the GPU to improve the performance. Large spatial data is stored in the disk on files. To process the large amount of spatial data efficiently, we propose a technique that merges the spatial data files to a large file and access the file with the method of memory mapped file. We implement the proposed technique and perform the experiment with the 747,302,971 points of the TIGER/Line spatial data. The result of the experiment is that the conversion time for the coordinate systems with the GPU is 30.16 times faster than the CPU only method and the rendering time is 80.40 times faster than the CPU.