• Title/Summary/Keyword: Parallel processor

Search Result 482, Processing Time 0.024 seconds

Design and Analysis of MPEG-2 MP@HL Decoder in Multi-Processor Environments

  • Yoo, Seung-Hwan;Lee, Hyun-Seung;Lee, Sang-Jo;Park, Rae-Hong;Kim, Do-Hyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.211-216
    • /
    • 2009
  • As demands for high-definition television (HDTV) increase, the implementation of real-time decoding of high-definition (HD) video becomes an important issue. The data size for HD video is so large that real-time processing of the data is difficult to implement, especially with software. In order to implement a fast moving picture expert group-2 decoder for HDTV, we compose five scenarios that use parallel processing techniques such as data decomposition, task decomposition, and pipelining. Assuming the multi digital signal processor environments, we analyze each scenario in three aspects: decoding speed, L1 memory size, and bandwidth. By comparing the scenarios, we decide the most suitable cases for different situations. We simulate the scenarios in the dual-core and dual-central processing unit environment by using OpenMP and analyze the simulation results.

  • PDF

Improving Performance of Large Sparse Linear System Solvers On Distributed Memory Systems By Asynchronous Algorithms (비동기 알고리즘을 이용한 분산 메모리 시스템에서의 초대형 선형 시스템 해법의 성능 향상)

  • Park, Pil-Seong;Sin, Sun-Cheol
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.439-446
    • /
    • 2001
  • The main stream of parallel programming today is using synchronous algorithms, where processor synchronization for correct computation and workload balance are essential. Overall performance of the whole system is dependent upon the performance of the slowest processor, if workload is not well-balanced or heterogeneous clusters are used. Asynchronous iteration is a way to mitigate such problems, but most of the works done so far are for shared memory systems. In this paper, we suggest and implement a parallel large sparse linear system solver that improves performance on distributed memory systems like clusters by reducing processor idle times as much as possible by asynchronous iterations.

  • PDF

Transputer integrated environment and its application (트랜지스터 통합환경과 그 응용)

  • 이효종;임훈철
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.34-44
    • /
    • 1996
  • A transputer is a powerful micro-processor, which can communicate with other processors with its own communication links and can be utilized into a powerful parallel computer system with easy way and low price. However, its usage has been limited to a few people because of its own programming language and complicated user interfaces. This paper presents programming tools to use a transputer system easily. It includes transputer integrated environment which has many developing tools, and transputer manager which can monitor and manage the system and developed programs on it. We also implemented a graphichs application program in order to test the feasibility and funcitonalities of programmers easy access to the transputer networks and visualize the performance status of each processor so that they can write efficient parallel programming codes.

  • PDF

Scheduler for parallel processing with finely grained tasks

  • Hosoi, Takafumi;Kondoh, Hitoshi;Hara, Shinji
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1991.10b
    • /
    • pp.1817-1822
    • /
    • 1991
  • A method of reducing overhead caused by the processor synchronization process and common memory accesses in finely grained tasks is described. We propose a scheduler which considers the preparation time during searching to minimize the redundant accesses to shared memory. Since the suggested hardware (synchronizer) determines the access order of processors and bus arbitration simultaneously by including the synchronization process into the bus arbitration process, the synchronization time vanishes. Therefore this synchronizer has no overhead caused by the processor synchronization[l]. The proposed scheduler algorithm is processed in parallel. The processes share the upper bound derived by each searching and the lower bound function is built considering the preparation time in order to eliminate as many searches as possible. An application of the proposed method to a multi-DSP system to calculate inverse dynamics for robot arms, showed that the sampling time can be twice shorter than that of the conventional one.

  • PDF

Thermal Imager Implementation Using Infrared Sensor (적외선 센서를 이용한 열상장비의 구현)

  • Yu, W.K.;Yoon, E.S.;Kim, C.W.;Song, I.S.;Hong, S.M.
    • Proceedings of the KIEE Conference
    • /
    • 1992.07b
    • /
    • pp.1250-1254
    • /
    • 1992
  • This paper describes the designed and fabricated thermal imaging system with the SPRITE(Signal PRocessing in The Element) detector, operating in the 3-12 micron band. This system consists of an afocal telescope, a scan unit containing the SPRITE detector, an electronic processor unit and a cooler. The optical scan system utilizing rotating polygon and oscillating mirror, is 2-dimensional serial/parallel scan type using five elements of the detector. And the electronic processor unit performs digital scan conversion to reform the parallel data stream into serial analog data compatable with conventional RS-170 video. The scan field of view is 40 ${\times}$ 26.7 and the MRTD(Minium Resolvable Temperature Difference) is 0.6 K at 7.5 cycles/mm. The acquired thermal image indicates that this system has a satisfactory performance.

  • PDF

Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines

  • Kim, Yong-Tae
    • Journal of IKEEE
    • /
    • v.2 no.2 s.3
    • /
    • pp.247-254
    • /
    • 1998
  • This paper presents a performance study of two LU decomposition algorithms on two massively parallel SIMD machines: the 16K processor MasPar MP-1 and the 4K processor MasPar MP-2. The paper presents experimental results and an analysis of the algorithms to explain the results. While the blocked and the nonblocked algorithms for LU decomposition have been studied individually by others, we compare the two algorithms and identify the tradeoffs between them. Our analysis of the blocked algorithm shows how the block size affects the interprocessor communication cost and the memory read/write overhead. The analysis in this paper is useful to determine an optimum block size for the blocked algorithm.

  • PDF

A Study on the 32 bit RISC/DSP Microprocessor Appropriate for Embedded Systems (내장형 시스템에 적합한 32 비트 RISC/DSP 마이크로프로세서에 관한 연구)

  • 유동열;문병인;홍종욱;이태영;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.257-260
    • /
    • 1999
  • We have designed a 32-bit RISC microprocessor with 16/32-bit fixed-point DSP functionality. This processor, called YRD-5, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP and load/store instructions with one or more issue latency cycles. High performance was achieved with these parallel functional units while adopting a sophisticated 5-stage pipeline structure and an improved DSP unit.

  • PDF

Implementation of $2{\times}2$ MIMO LTE Base Station using GPU for SDR System (GPU를 이용한 SDR 시스템 용 LTE MIMO 기지국 기능 구현)

  • Lee, Seung Hak;Kim, Kyung Hoon;Ahn, Chi Young;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2012
  • This paper implements 2X2 MIMO Long Term Evolution (LTE) base station using Software defined radio (SDR) technology. The implemented base station system processes baseband signals on a Graphics Processor Unit(GPU). GPU is a high-speed parallel processor which provides very important advantage of using a very powerful C-based programming environment that is Compute Unified Device Architecture (CUDA). The implemented software-based base station system processes baseband signals through GPU. It utilizes USRP2 as its RF transceiver. In order to guarantee a real-time processing of LTE baseband signals, we have adopted well-known signal processing algorithms such as frame synchronization algorithms, ML detection, etc. using GPU operating in parallel processing.

A Study on Real Time Monitoring of Tool Breakage in Milling Operation Using a DSP (DSP를 이용한 정면 밀링공구의 실시간 파단 감시방법에 관한 연구)

  • Baek, Dae-Kyun;Ko, Tae-Jo;Kim, Hee-Sool
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.13 no.6
    • /
    • pp.168-176
    • /
    • 1996
  • A diagnosis system which can monitor tool breakage and chipping in real time was developed using a DSP(Digital Signal Processor) board in face milling operation. AR modelling and band energy method were used to extract the feature of tool states from cutting force signals. Artificial neural network embedded on DSP board discriminates different patterns from features got after signal processing. The features extracted from AR modelling are more accurate for the malfunction of a process than those from band energy method, even though the computing speed of the former is slow. From the processed features, we can construct the real time diagnosis system which monitors malfunction by using a DSP board having a parallel processing capability.

  • PDF

Accelerating the Sweep3D for a Graphic Processor Unit

  • Gong, Chunye;Liu, Jie;Chen, Haitao;Xie, Jing;Gong, Zhenghu
    • Journal of Information Processing Systems
    • /
    • v.7 no.1
    • /
    • pp.63-74
    • /
    • 2011
  • As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.