• Title/Summary/Keyword: Parallel processor

Search Result 482, Processing Time 0.023 seconds

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

Implementation of Ray Tracing Processor for the Parallel Processing (병렬처리를 위한 고속 Ray Tracing 프로세서의 설계)

  • Choe, Gyu-Yeol;Jeong, Deok-Jin
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.5
    • /
    • pp.636-642
    • /
    • 1999
  • The synthesis of the 3D images is the most important part of the virtual reality. The ray tracing is the best method for reality in the 3D graphics. But the ray tracing requires long computation time for the synthesis of the 3D images. So, we implement the ray tracing with software and hardware. Specially we design the hit-test unit with FPGA tool for the ray tracing. Hit-test unit is a very important part of ray tracing to improve the speed. In this paper, we proposed a new hit-test algorithm and apply the parallel architecture for hit-test unit to improve the speed. We optimized the arithmetic unit because the critical path of hit-test unit is in the multiplication part. We used the booth algorithm and the baugh-wooley algorithm to reduce the partial product and adapted the CSA and CLA to improve the efficiency of the partial product addition. Our new Ray tracing processor can produce the image about 512ms/F and can be adapted to real-time application with only 10 parallel processors.

  • PDF

Implementation of Embedded Micro Web Server for Web based Remote Hardware Control and Monitor (웹 기반 하드웨어 원격감시 및 제어를 위한 초소형 내장형 웹 서버 시스템의 구현)

  • Han, Kyong-Ho
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.20 no.6
    • /
    • pp.104-110
    • /
    • 2006
  • In this paper, we proposed the micro web-server implementation on Strong ARM processor with embedded Linux. The parallel port connecting parallel I/O is controlled via HTTP protocol and web browser program HTTP protocol with Linux, the micro web server program and port control program are installed on-board memory using CGI to be accessed by web browser. The processor parallel input port is monitored and parallel output port is controlled from remote hosts via HTTP protocol. The result of the proposed embedded micro-web server can be used in remote automation systems, distributed control via internet using web browser.

The Design and implementation of parallel processing system using the $Nios^{(R)}$ II embedded processor ($Nios^{(R)}$ II 임베디드 프로세서를 사용한 병렬처리 시스템의 설계 및 구현)

  • Lee, Si-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.97-103
    • /
    • 2009
  • In this thesis, we discuss the implementation of parallel processing system which is able to get a high degree of efficiency(size, cost, performance and flexibility) by using $Nios^{(R)}$ II(32bit RISC(Reduced Instruction Set Computer) processor) embedded processor in DE2-$70^{(R)}$ reference board. The designed Parallel processing system is master-slave, shared memory and MIMD(Mu1tiple Instruction-Multiple Data stream) architecture with 4-processor. For performance test of system, N-point FFT is used. The result is represented speed-up as follow; in the case of using 2-processor(core), speed-up is shown as average 1.8 times as 1-processor's. When 4-processor, the speed-up is shown as average 2.4 times as it's.

Analysis of Barrier Waiting Times in Data Parallel Programs (데이터 병렬 프로그램에서 배리어 대기시간의 분석)

  • Jung, In-Bum
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.73-80
    • /
    • 2001
  • Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier than others should wait at the barrier, the total processor utilization decreases. In this paper, to find the sources of the barrier waiting time, parallel programs are executed on the various grain sizes through execution-driven simulations. In simulation studies, we found that even if approximately equal amounts of work are distributed to each processor, all processes may not arrive at a barrier at the same time. The reasons are that the different numbers of cache misses and instructions within partitioned grains result in the difference in arrival time of processors at the barrier.

  • PDF

A Study on the Design of Format Converter for Pixel-Parallel Image Processing (픽셀-병렬 영상처리에 있어서 포맷 컨버터 설계에 관한 연구)

  • 김현기;김현호;하기종;최영규;류기환;이천희
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.269-272
    • /
    • 2001
  • In this paper we proposed the format converter design and implementation for real time image processing. This design method is based on realized the large processor-per-pixel array by integrated circuit technology in which this two types of integrated structure is can be classify associative parallel processor and parallel process with DRAM cell. Layout pitch of one-bit-wide logic is identical memory cell pitch to array high density PEs in integrate structure. This format converter design has control path implementation efficiently, and can be utilized the high technology without complicated controller hardware. Sequence of array instruction are generated by host computer before process start, and instructions are saved on unit controller. Host computer is executed the pixel-parallel operation starting at saved instructions after processing start

  • PDF

Debugging of Parallel Programs using Distributed Cooperating Components

  • Mrayyan, Reema Mohammad;Al Rababah, Ahmad AbdulQadir
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.570-578
    • /
    • 2021
  • Recently, in the field of engineering and scientific and technical calculations, problems of mathematical modeling, real-time problems, there has been a tendency towards rejection of sequential solutions for single-processor computers. Almost all modern application packages created in the above areas are focused on a parallel or distributed computing environment. This is primarily due to the ever-increasing requirements for the reliability of the results obtained and the accuracy of calculations, and hence the multiply increasing volumes of processed data [2,17,41]. In addition, new methods and algorithms for solving problems appear, the implementation of which on single-processor systems would be simply impossible due to increased requirements for the performance of the computing system. The ubiquity of various types of parallel systems also plays a positive role in this process. Simultaneously with the growing demand for parallel programs and the proliferation of multiprocessor, multicore and cluster technologies, the development of parallel programs is becoming more and more urgent, since program users want to make the most of the capabilities of their modern computing equipment[14,39]. The high complexity of the development of parallel programs, which often does not allow the efficient use of the capabilities of high-performance computers, is a generally accepted fact[23,31].

The Synthesizing Implementation of Iterative Algorithms on Processor Arrays (순환 알고리즘의 Processor Array에로의 합성 및 구현)

  • 이덕수;신동석
    • Journal of the Korean Institute of Navigation
    • /
    • v.14 no.4
    • /
    • pp.31-39
    • /
    • 1990
  • A systematic methodology for efficient implementation of processor arrays from regular iterative algorithms is proposed. One of the modern parallel processing array architectures is the Systolic arrays and we use it for processor arrays on this paper. On designing the systolic arrays, there are plenty of mapping functions which satisfy necessary conditions for its implementation to the time-space domain. In this paper, we sue a few conditions to reduce the total number of computable mapping functions efficiently. As a results of applying this methodology, efficient designs of systolic arrays could be done with considerable saving on design time and efforts.

  • PDF

Parallel Implementation and Performance Evaluation of the SIFT Algorithm Using a Many-Core Processor (매니코어 프로세서를 이용한 SIFT 알고리즘 병렬구현 및 성능분석)

  • Kim, Jae-Young;Son, Dong-Koo;Kim, Jong-Myon;Jun, Heesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we implement the SIFT(Scale-Invariant Feature Transform) algorithm for feature point extraction using a many-core processor, and analyze the performance, area efficiency, and system area efficiency of the many-core processor. In addition, we demonstrate the potential of the proposed many-core processor by comparing the performance of the many-core processor with that of high-performance CPU and GPU(Graphics Processing Unit). Experimental results indicate that the accuracy result of the SIFT algorithm using the many-core processor was same as that of OpenCV. In addition, the many-core processor outperforms CPU and GPU in terms of execution time. Moreover, this paper proposed an optimal model of the SIFT algorithm on the many-core processor by analyzing energy efficiency and area efficiency for different octave sizes.

AI Processor Technology Trends (인공지능 프로세서 기술 동향)

  • Kwon, Youngsu
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.5
    • /
    • pp.121-134
    • /
    • 2018
  • The Von Neumann based architecture of the modern computer has dominated the computing industry for the past 50 years, sparking the digital revolution and propelling us into today's information age. Recent research focus and market trends have shown significant effort toward the advancement and application of artificial intelligence technologies. Although artificial intelligence has been studied for decades since the Turing machine was first introduced, the field has recently emerged into the spotlight thanks to remarkable milestones such as AlexNet-CNN and Alpha-Go, whose neural-network based deep learning methods have achieved a ground-breaking performance superior to existing recognition, classification, and decision algorithms. Unprecedented results in a wide variety of applications (drones, autonomous driving, robots, stock markets, computer vision, voice, and so on) have signaled the beginning of a golden age for artificial intelligence after 40 years of relative dormancy. Algorithmic research continues to progress at a breath-taking pace as evidenced by the rate of new neural networks being announced. However, traditional Von Neumann based architectures have proven to be inadequate in terms of computation power, and inherently inefficient in their processing of vastly parallel computations, which is a characteristic of deep neural networks. Consequently, global conglomerates such as Intel, Huawei, and Google, as well as large domestic corporations and fabless companies are developing dedicated semiconductor chips customized for artificial intelligence computations. The AI Processor Research Laboratory at ETRI is focusing on the research and development of super low-power AI processor chips. In this article, we present the current trends in computation platform, parallel processing, AI processor, and super-threaded AI processor research being conducted at ETRI.