• Title/Summary/Keyword: Parallel processor

Search Result 482, Processing Time 0.032 seconds

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

A Parallel Processor System for Cultural Assets Image Retrieval (문화재 검색을 위한 병렬처리기 구조)

  • Yoon, Hee-Jun;Lee, Hyung;Han, Ki-Sun;Partk, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.2
    • /
    • pp.154-161
    • /
    • 1998
  • This paper proposes a parallel processor system which processes cultural assets image recognition and retrieval algorithm in real time. A serial algorithm which is developed for the parallel processor system is parallellized. The parallel processor system consists of a control unit, 100 PE(Processing Elements), and 10 Park's multi-access memory systems which has 11 memory modules per each one. The parallel processor system is simulated by CADENCE Verilog-XL which is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed ratio of the parallel algorithm to the serial one is 81. The parallel processor system we proposed is quite effective for cultural assets image processing.

  • PDF

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.

Realization of a Parallel Network System for Image Processing Techniques (영상 처리 기법을 위한 병렬화 네트워크 시스템의 구성)

  • 서원찬;조강현;김우열
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.6 no.6
    • /
    • pp.492-499
    • /
    • 2000
  • In this paper, realization techniques of the parallel processing and the parallel network system for image processing are described. The parallel image processing system is constructed by the characterization of image processing and processor. Several problems are solved to achieve effective parallel processing and processor networking with the particular properties of image processing, which are reduction of communication quantity, equalization of load and delay depreciation on communication. A parallel image input device is developed for the flexible networking of parallel image processing. An abnormal region detection algorithm which is the basic function in machine vision is applied to evaluate the constructed parallel image processing system. The performance and effectiveness of the system are confirmed by experiments.

  • PDF

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

  • Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.485-490
    • /
    • 2017
  • In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

  • Kwon Do-All;Chung Tae-Sang
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.12
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

Electronic Processor Design for Thermal Imager with Serial/Parallel Scan type (직병렬 주사방식 일정장비의 신호처리기 설계 연구)

  • 송인섭;유위경;윤은석;홍영철;홍석민
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.1
    • /
    • pp.49-56
    • /
    • 1994
  • This paper describes the design principles and methods of electronic processor for thermal imager with the SPRITE detector, operating in the 8-12 micron band. The thermal imager consists of a optical scanner containing the detector and an electrical signal processor. The optical scanner utilizing rotating polygon and oscillating mirror, is 2-dimensional serial/parallel scan type using 5 elements of the detector. And the electronic processor has pre-processing of 5 chnanel's thermal signal from the detector, and performs digital scan conversion to reform the parallel data stream into serial analog data compatible with conventional RS-170 video. Through the designed electronic processor, we have acquired a satisfactory thermal image. And the MRTD (Minimum Resolvable Temperature Difference) is 0.5$^{\circ}$K at 7.5 cycles/mm.

  • PDF

A High-Speed 2-Parallel Radix-$2^4$ FFT Processor for MB-OFDM UWB Systems (MB-OFDM UWB 통신 시스템을 위한 고속 2-Parallel Radix-$2^4$ FFT 프로세서의 설계)

  • Lee, Jee-Sung;Lee, Han-Ho
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.533-534
    • /
    • 2006
  • This paper presents the architecture design of a high-speed, low-complexity 128-point radix-$2^4$ FFT processor for ultra-wideband (UWB) systems. The proposed high-speed, low-complexity FFT architecture can provide a higher throughput rate and low hardware complexity by using 2-parallel data-path scheme and single-path delay-feedback (SDF) structure. This paper presents the key ideas applied to the design of high-speed, low-complexity FFT processor, especially that for achieving high throughput rate and reducing hardware complexity. The proposed FFT processor has been designed and implemented with the 0.18-m CMOS technology in a supply voltage of 1.8 V. The throughput rate of proposed FFT processor is up to 1 Gsample/s while it requires much smaller hardware complexity.

  • PDF