• Title/Summary/Keyword: Parallel computer architecture

Search Result 231, Processing Time 0.027 seconds

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT (JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계)

  • Lee, Seung-Kwon;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.7
    • /
    • pp.29-38
    • /
    • 2009
  • In this paper, a high performance pipelined computing design of parallel multiplier-temporal buffer-parallel accumulator is present for the convolution-based non-cascaded architecture aiming at the real time Discrete Wavelet Transform(DWT) processing. The convolved multiplication of DWT would be reduced upto 1/4 by utilizing the filter coefficients symmetry and the up/down sampling; and it could be dealt with 3-5 times faster computation by LUT-based DA multiplication of multiple filter coefficients parallelized for product terms with an image data. Further, the reutilization of computed product terms could be achieved by storing in the temporal buffer, which yields the saving of computation as well as dynamic power by 50%. The convolved product terms of image data and filter coefficients are realigned and stored in the temporal buffer for the accumulated addition. Then, the buffer management of parallel aligned storage is carried out for the high speed sequential retrieval of parallel accumulations. The convolved computation is pipelined with parallel multiplier-temporal buffer-parallel accumulation in which the parallelization of temporal buffer and accumulator is optimize, with respect to the performance of parallel DA multiplier, to improve the pipelining performance. The proposed architecture is back-end designed with 0.18um library, which verifies the 30fps throughput of SVGA(800$\times$600) images at 90MHz.

Inverse Kinematic and Dynamic Analyses of 6-DOF PUS Type parallel Manipulators

  • Kim, Jong-Phil;Jeha Ryu
    • Journal of Mechanical Science and Technology
    • /
    • v.16 no.1
    • /
    • pp.13-23
    • /
    • 2002
  • This paper presents inverse kinematic and dynamic analyses of HexaSlide type six degree-of-freedom parallel manipulators. The HexaSlide type parallel manipulators (HSM) can be characterized as an architecture with constant link lengths that are attached to moving sliders on the ground and to a mobile platform. In the inverse kinematic analyses, the slider and link motion (position, velocity, and acceleration) is computed given the desired mobile platform motion. Based on the inverse kinematic analysis, in order to compute the required actuator forces given the desired platform motion, inverse dynamic equations of motion of a parallel manipulator is derived by the Newton-Euler approach. In this derivation, the joint friction as well as all link inertia are included. Relative importance of the link inertia and joint frictions on the computed torque is investigated by computer simulations. It is expected that the inverse kinematic and dynamic equations can be used in the computed torque control and model-based adaptive control strategies.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

A Computer Architecture Education Framework in IT Convergence Services Era (IT융합 서비스 환경을 위한 컴퓨터 아키텍쳐 교육 프레임워크)

  • Choi, Chang Yeol;Choi, Hwang Kyu
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.23-31
    • /
    • 2013
  • A rapid growth of IT convergence into different application areas draws a lot of interest in high performance platform and embedded system. Industry needs well educated computer professionals with the practical understanding on the emerging technologies and core issues of contemporary popular services. In this paper, we present an education framework for computer system architecture based on rigorous analyses of the characteristics of IT convergence services and information technology trends. The proposed framework puts emphasis on real-world and hands-on subjects related to multicore architecture, embedded system and parallel processing. We believe effective use in the development and management of computer system architecture courses encouraging both industries and students.

A Message Transfer Scheme for Efficient Message Passing in the Highly Parallel Computer SPAX (고속병렬컴퓨터(SPAX)에서의 효율적인 메시지 전달을 위한 메시지 전송 기법)

  • 모상만;신상석;윤석한;임기욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.9
    • /
    • pp.1162-1170
    • /
    • 1995
  • In this paper, we present a message transfer scheme for efficient message passing in the hierarchically structured multiprocessor computer SPAX(Scalable Parallel Architecture computer based on X-bar network). The message transfer scheme provides interface not only with operating system but also with end users. In order to transfer two types of control message and data message efficiently, it supports both of memory-mapped transfer and DMA-based transfer. Dual-port RAMs are used as message buffers, and control and status registers provide efficient programming interface. Interlaced parity scheme is adopted for error control. If any error is detected at receiving node, errored packet is resent by sender according to retry mechanism. In conjunction with retry mechanism, watchdog timers are used to protect infinite waiting and repeated retry. The proposed message transfer scheme can be applied to input/output nodes and communication connection nodes as well as processing nodes in the SPAX.

  • PDF

The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System (실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계)

  • Cho, Yong Il;Ha, Choong Lim;Yang, Ji Hyeon;Kim, Jae Hyup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we propose a CUDA(Common Unified Device Architecture) based SW(software) design method for CPU(Central Processing Unit) and GPU(Graphic Processing Unit) parallel structure to implement real-time process in 3D Laser ladar(LADAR) imaging system. LADAR is a complex system to generate 3-dimensional image based on the laser ranging information, and requires massive process resources in each phase. Therefore, designing and implementing parallel structure are crucial to realize a real-time process within limited system resource. As a conclusion, we can meet the speed of required real-time process allocating separable work load to CUDA GPU by analyzing process algorithm in each phase and confirm the process speed increase by 46%.

A Public Key knapsack Crytosystem Algorithm for Security in Computer Communication (컴퓨터 통신의 안전을 위한 공개키 배낭 암호계 앨고리듬)

  • 이영노;신인철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.9
    • /
    • pp.893-900
    • /
    • 1991
  • And this system is compared with past knapsack system by implementation of low density attack in Brickell and Lagarias, Odlyzko’s method. Also the VLSI architecture for parallel implementation of this linearly shift knapsack system is presented

  • PDF

Performance Analysis of the XMESH Topology for the Massively Parallel Computer Architecture (대규모 병렬컴퓨터를 위한 교차메쉬구조 및 그의 성능해석)

  • 김종진;최흥문
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.720-729
    • /
    • 1995
  • We proposed a XMESH(crossed-mesh) topology as a suitable interconnection for the massively parallel computer architectures, and presented performance analysis of the proposed interconnection topology. Horizontally, the XMESH has the same links as those of the toroidal mesh(TMESH) or toroid, but vertically, it has diagonal cross links instead of the vertical links. It reveals desirable interconnection characteristics for the massively parallel computers as the number of nodes increases, while retaining the same structural advantages of the TMESH such as the symmetric structure, periodic placement of subsystems, and constant degree, which are highly recommended features for VLSI/WSI implementations. Furthermore, n*k XMESH can be easily expanded without increasing the diameter as long as n.leq.k.leq.n+4. Analytical performance evaluations show that the XMESH has a shorter diameter, a shorter mean internode distance, and a higher message completion rate than the TMESH or the diagonal mesh(DMESH). To confirm these results, an optimal self-routing algorithm for the proposed topology is developed and is used to simulate the average delay, the maximum delay, and the throughput in the presence of contention. In all cases, the XMESH is shown to outperform the TMESH and the DMESH regardless of the communication load conditions or the number of nodes of the networks, and can provide an attractive alternative to those networks in implementing massively parallel computers.

  • PDF

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

  • Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.