• Title/Summary/Keyword: Parallel processor

Search Result 485, Processing Time 0.03 seconds

An Implementation and Verification of Performance Monitor for Parallel Signal Processing System (병렬신호처리시스템을 위한 성능 모니터의 구현 및 검증)

  • Lee Won-Joo;Kim Hyo-Nam
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.313-322
    • /
    • 2005
  • In this paper, we implement and verify performance monitor for parallel signal processing system, using DSP Starter Kit(DSK) of which the basic Processor is TMS302C6711 chip. The key ideas of this performance monitor is, using Real Time Data Exchange(RTDX) for the Purpose of real-time data transfer and function of DSP/BIOS, the ability to measure the Performance measure like DSP workload, memory usage, and bridge traffic. In the simulation, FFT, 2D FFT, Matrix Multiplication, and Fir Filter, which are widely used DSP algorithms, have been employed. Using performance monitor and Code Composer Studio from Texas Instrument(Tl) , the result has been recorded according to different frequencies, data sizes, and buffer sizes for a single wave file. The accuracy of our performance monitor has been verified by comparing those recorded results.

  • PDF

A New Parallelizing Algorithm and Cell-based Hardware Architecture for High-speed Generation of Digital Hologram (디지털 홀로그램의 고속 생성을 위한 병렬화 알고리즘 및 셀 기반의 하드웨어 구조)

  • Seo, Young-Ho;Choi, Hyun-Jun;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of Broadcast Engineering
    • /
    • v.16 no.1
    • /
    • pp.54-63
    • /
    • 2011
  • This paper proposes a new equation to calculate computer-generated hologram (CGH) in a high speed and its cell-based VLSI (veri large scale integrated circuit) architecture. After finding the calculational regularity in the horizontal or vertical direction from the basic CGH equation, we induce the new equation to calculate the horizontal or vertical hologram pixel values in parallel. We also propose the architecture of the CGH cell consisting of a initial parameter calculator and update-phase calculator(s) on the basis of the equation and implement them in hardware. Also we show a hardware architecture to parallelize the calculation in the horizontal direction by extending CGH. In the experiments we analyze the used hardware resources. These analyses makes it possible to select the amount of hardware to the precision of the results. Here, for the CGH kernel and the structure of the processor, we used the platform from our previous works.

Efficient 3D Modeling of CSEM Data (인공송신원 전자탐사 자료의 효율적인 3차원 모델링)

  • Jeong, Yong-Hyeon;Son, Jeong-Sul;Lee, Tae-Jong
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 2009.10a
    • /
    • pp.75-80
    • /
    • 2009
  • Despite its flexibility to complex geometry, three-dimensional (3D) electromagnetic(EM) modeling schemes using finite element method (FEM) have been faced to practical limitation due to the resulting large system of equations to be solved. An efficient 3D FEM modeling scheme has been developed, which can adopt either direct or iterative solver depending on the problems. The direct solver PARDISO can reduce the computing time remarkably by incorporating parallel computing on multi-core processor systems, which is appropriate for single frequency multi-source configurations. When limited memory, the iterative solver BiCGSTAB(1) can provide fast and stable convergence. Efficient 3D simulations can be performed by choosing an optimum solver depending on the computing environment and the problems to be solved. This modeling includes various types of controlled-sources and can be exploited as an efficient engine for 3D inversion.

  • PDF

Design and Performance Evaluation of MIN for Nonuniform Traffic (비균등 트래픽을 위한 MIN의 설계 및 성능 평가)

  • Choe, Chang-Hun;Kim, Seong-Cheon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.6
    • /
    • pp.1-9
    • /
    • 2000
  • This paper presents a Cluster Oriented Multistage Interconnection Network called COMR. COMR can be constructed suitable for the parallel application with localized communication by providing the shortcut path inside the processor-memory cluster which has frequent data communication. We evaluate the performance of COMR with respect to probability of acceptance, bandwidth, cost-effectiveness and average distance under varying degrees of localized communication. According to the result of analysis for performance evaluation, COMR shows higher performance than the regular MINs of the same network size in the highly localized communication. In the worst case, the diameter of an N$\times$N COMR is only n+1 which has only one stage more as compared the MIN with the same network size. Therefore COMR can be used as an attractive interconnection network for parallel applications with not only the localized communication distribution but also the uniform distribution in shared-memory multiprocessor system.

  • PDF

Implementation of Viterbi Decoder on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 비터비 복호기 구현)

  • Lee, KyuHyung;Lee, Ho-Kyoung;Heo, Seo Weon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.9
    • /
    • pp.3-11
    • /
    • 2013
  • Recently, a plenty of researches have been conducted using the massively parallel processing of GPU for the implementation of communication system. In this paper, we tried to reduce software simulation time applying GPU with sliding block method to Viterbi decoder in DVB-T system which is one of European DTV standards. First of all, we implement DVB-T system by CPU and estimate cost time whereby the system processes one OFDM symbol. Secondly, we implement Viterbi decoder by software using NVIDIA's massive GPU processor. In our work, stream process method is applied to reduce the overhead for data transfer between CPU and GPU, as well as coalescing method to lower the global memory access time. In addition, data structure design method is used to maximize the shared memory usage. Consequently, our proposed method is approximately 11 times faster in 2K mode and 60 times faster in 8K mode for the process in Viterbi decoder.

OPENMP PARALLEL PERFORMANCE OF A CFD CODE ON MULTI-CORE SYSTEMS (멀티코어 시스템에서 쓰레드 수에 따른 CFD 코드의 OpenMP 병렬 성능)

  • Kim, J.K.;Jang, K.J.;Kim, T.Y.;Cho, D.R.;Kim, S.D.;Choi, J.Y.
    • Journal of computational fluids engineering
    • /
    • v.18 no.1
    • /
    • pp.83-90
    • /
    • 2013
  • OpenMP is becoming more and more useful as a simple parallel processing paradigm on SMP (Shared Memory Multi-Processors) computing environment with the development of multi-core processors. However, very few data is available publically regarding the OpenMP performance in CFD (Computational Fluid Dynamics). In the present study a CFD test suite is prepared for the performance evaluation of OpenMP on various multi-core systems. The test suite is composed of two-dimensional numerical simulations for inviscid/viscous and reacting/non-reacting flows using three different levels of grid systems. One to five test runs were carried out on various systems from dual-core dual threads to 16-core 32-threads systems by changing the number of threads engaged for each test up to 80. The results exhibit some interesting results and the lessons learned from the tests would be quite helpful for the further use of OpenMP for CFD studies using multi-core processor systems.

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

  • Hong, Seung-Pyo;Moon, Soo-Mook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.401-407
    • /
    • 2000
  • Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.

  • PDF

The PALM system : Architecture and Network Performance (PALM시스템의 구조와 네트웍 성능)

  • Kim, Suk-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.105-113
    • /
    • 1994
  • This paper introduces the Parallel Advanced Loosely coupled Multiprocessor (PALM) architecture, which is based on HCH(m,p), where m is number of links per a communication processor (CP) and p is the number of application processors (APs) connected to the CP. communication links between a pair of CPs and/or between a CP and an AP, are made of dual-Port RAMs, which provide fast and reliable word-parallel communication between processors. Among the wide spectrum of HCH networks, HCH(m,2) is also known to be a cost optimal topology, such that HCH(m,2) consists of the largest number of APs retaining the minimal number of CPs and communication links. We also implement a testbed based on HCH(2,2). The experiment result shows that the small communication/computation ratio of the PALM system would realize fine-grain parallelism on message-passing MIMD systems.

  • PDF

Implementation of a Client Display Interface for Mobile Devices via Serial Transfer (모바일 직렬 전송방식의 클라이언트 디스플레이 인터페이스 구현)

  • Park Sang-Woo;Lee Yong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.522-525
    • /
    • 2006
  • Recently, mobile devices support multi-functions such as 3D game, wireless internet, moving pictures, DMB, GPS, and PMP. Bigger size of display device is indispensable to support these functions and higher speed of the interface is needed. However, conventional parallel interfaces between processor and display nodule are not competent enough for that high speed transfers. High-speed serial interface is beginning to appear as an alternative for parallel interface. The advantages of the serial interface are high bandwidth, small number of interconnections, low-power consumption, and good quality of electro-magnetic interference. In this paper, we implement serial interface and use it for a display module. LVDS is used for PHY layer and a defined packet is used for link layer. The feature of the implemented serial interface is the reduced number of interconnections with enough bandwidth.

  • PDF

Quality of Coverage Analysis on Distributed Stochastic Steady-State Simulations (분산 시뮬레이션에서의 Coverage 분석에 관한 연구)

  • Lee, Jong-Suk-R.;Park, Hyoung-Woo;Jeong, Hae-Duck-J.
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.519-524
    • /
    • 2002
  • In this paper we study the qualify of sequential coverage analysis under a scenario of distributed stochastic simulation known as MRIP(Multiple Replications In Parallel) in terms of the confidence intervals of coverage and the speedup. The estimator based in the F-distribution was applied to the sequential coverage analysis of steady-state means. in simulations of the $M/M/1/{\infty},\;M/D/I/{\infty}\;and\;M/H_{2}/1/{\infty}$ queueing systems on a single processor and multiple processors. By using multiple processors under the MRIP scenario, the time for collecting many replications needed in sequential coverage analysis is reduced. One can also easily collect more replications by executing it in distributed computers or clusters linked by a local area network.