• Title/Summary/Keyword: Parallel processor

Search Result 482, Processing Time 0.023 seconds

A Study on VLSI-Oriented 2-D Systolic Array Processor Design for APP (Algebraic Path Problem) (VLSI 지향적인 APP용 2-D SYSTOLIC ARRAY PROCESSOR 설계에 관한 연구)

  • 이현수;방정희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.7
    • /
    • pp.1-13
    • /
    • 1993
  • In this paper, the problems of the conventional special-purpose array processor such as the deficiency of flexibility have been investigated. Then, a new modified methodology has been suggested and applied to obtain the common solution of the three typical App algorithms like SP(Shortest Path), TC(Transitive Closure), and MST(Minimun Spanning Tree) among the various APP algorithms using the similar method to obtain the solution. In the newly proposed APP parallel algorithm, real-time Processing is possible, without the structure enhancement and the functional restriction. In addition, we design 2-demensional bit-parallel low-triangular systolic array processor and the 1-PE in detail. For its evaluation, we consider its computational complexity according to bit-processing method and describe relationship of total chip size and execution time. Therefore, the proposed processor obtains, on which a large data inputs in real-time, 3n-4 execution time which is optimal o(n) time complexity, o(n$^{2}$) space complexity which is the number of total gate and pipeline period rate is one.

  • PDF

3D graphics processor architecture based on multistreaming (다중스트리밍을 이용한 3차원 그래픽 프로세서 구조)

  • 박용진;이동호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.9
    • /
    • pp.10-21
    • /
    • 1997
  • In this paper, we propose multiple instruction issuable multi-streaming as a processor architecture for 3D graphics processor. Multistreaming can eliminate inteferences within concurrently executing instructions inthe pipelined processor to allow enough parallelism for parallel processing. Through cycle level simulation study, we show that the proposed architecture outperforms a conventional RISC processor, MIPS R3000 by three times with reasonable resource overheads. Multiple instruction issuable multistreaming processor will be a bood architecture for instruction processor when a large number of threads are guaranteed.

  • PDF

Design of an Image Processing ASIC Architecture using Parallel Approach with Zero or Little (통신부담을 감소시킨 영상처리를 위한 병렬처리 방식 ASIC구조 설계)

  • 안병덕;정지원;선우명훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.2043-2052
    • /
    • 1994
  • This paper proposes a new parallel ASIC architecture for real-time image processing to reduce inter-processing element (inter-PE) communication overhead, called a Sliding Memory Plane (SliM) Image Processor. The Slim Image Processor consists of $3\times3$ processing elements (PEs) connected by a mesh topology. With easy scalability due to the topology. a set of SliM Image Processors can form a mesh-connected SIMD parallel architecture. called the SliM Array Processor. The idea of sliding means that all pixels are slided into all neighboring PEs without interrupting PEs and without a coprocessor or a DMA controller. Since the inter-PE communication and computation occur simultaneously. the inter-PE communication overhead, significant disadvantage of existing machines greatly diminishes. Two I/O planes provide a buffering capability and reduce the date I/O overhead. In addition, using the by-passing path provides eight-way connectivity even with four links. with these salient features. SliM shows a significant performance improvement. This paper presents architectures of a PE and the SliM Image Processor, and describes the design of an instruction set.

  • PDF

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

Multicore Processor based Parallel SVM for Video Surveillance System (비디오 감시 시스템을 위한 멀티코어 프로세서 기반의 병렬 SVM)

  • Kim, Hee-Gon;Lee, Sung-Ju;Chung, Yong-Wha;Park, Dai-Hee;Lee, Han-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.161-169
    • /
    • 2011
  • Recent intelligent video surveillance system asks for development of more advanced technology for analysis and recognition of video data. Especially, machine learning algorithm such as Support Vector Machine (SVM) is used in order to recognize objects in video. Because SVM training demands massive amount of computation, parallel processing technique is necessary to reduce the execution time effectively. In this paper, we propose a parallel processing method of SVM training with a multi-core processor. The results of parallel SVM on a 4-core processor show that our proposed method can reduce the execution time of the sequential training by a factor of 2.5.

Parallel Performance of Preconditioned Navier-Stokes Code on Myrinet Environment (Myrinet 환경에서 예조건화 Navier-Stokes 코드의 병렬처리 성능)

  • Kim M.-H.;Lee G. S.;Choi J.-Y.;Kim K. S.;Kim S.-L.;Jeung I.-S.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2001.05a
    • /
    • pp.149-154
    • /
    • 2001
  • Parallel performance of a Myrinet based PC-cluster was tested and compared with a conventional Fast-Ethernet system. A preconditioned Navier-Stokes code was parallelized with domain decomposition technique, and used for the parallel performance test. Speed-up ratio was examined as a major performance parameter depending on the number of processor and the network topology. As was expected, Myrinet system shows a superior parallel performance to the Fast-Ethernet system even with a single network adpater for a dual processor SMP machine. A test for the dependency on problem size also shows that network communication speed is a crucial factor for parallelized computational fluid dynamics analysis and the Myrinet system is a plausible candidate for high performance parallel computing system.

  • PDF

Parallel Implementation Strategy for Content Based Video Copy Detection Using a Multi-core Processor

  • Liao, Kaiyang;Zhao, Fan;Zhang, Mingzhu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3520-3537
    • /
    • 2014
  • Video copy detection methods have emerged in recent years for a variety of applications. However, the lack of efficiency in the usual retrieval systems restricts their use. In this paper, we propose a parallel implementation strategy for content based video copy detection (CBCD) by using a multi-core processor. This strategy can support video copy detection effectively, and the processing time tends to decrease linearly as the number of processors increases. Experiments have shown that our approach is successful in speeding up computation and as well as in keeping the performance.

An Efficient Central Queue Management Algorithm for High-speed Parallel Packet Filtering (고속 병렬 패킷 여과를 위한 효율적인 단일버퍼 관리 방안)

  • 임강빈;박준구;최경희;정기현
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.7
    • /
    • pp.63-73
    • /
    • 2004
  • This paper proposes an efficient centralized sin91e buffer management algorithm to arbitrate access contention mon processors on the multi-processor system for high-speed Packet filtering and proves that the algorithm provides reasonable performance by implementing it and applying it to a real multi-processor system. The multi-processor system for parallel packet filtering is modeled based on a network processor to distribute the packet filtering rules throughout the processors to speed up the filtering. In this paper we changed the number of processors and the processing time of the filtering rules as variables and measured the packet transfer rates to investigate the performance of the proposed algorithm.

Shortest Path Calculation Using Parallel Processor System (병력구조 전산기를 이용한 최단 경로 계산)

  • 서창진;이장규
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.34 no.6
    • /
    • pp.230-237
    • /
    • 1985
  • Shortest path calculations for a large-scale network have to be performed using a decomposition techniqre, since the calculations require large memory size which increases by the square of the number of vertices in the network. Also, the calculation time increases by the cube of the number of vertices in the network. In the decomposition technique,the network is broken into a number of smaller size subnetworks for each of which shortest paths are computed. A union of the solutions provides the solution of the original network. In all of the decomposition algirithms developed up to now, boundary vertices which divide all the subnetworks have to be included in computing shortest paths for each subnetwork. In this paper, an improved algorithm is developed to reduce the number of boundary vertices to be engaged. In the algorithm, only those boundary vertices that are directly connected to the subnetwork are engaged. The algorithm is suitable for an application to real time computation using a parallel processor system which consists of a number of micro-computers or prcessors. The algorithm has been applied to a 39- vertex network and a 232-vertex network. The results show that it is efficient and has better performance than any other algorithms. A parallel processor system has been built employing an MZ-80 micro-computer and two Z-80 microprocessor kits. The former is used as a master processor and the latter as slave processors. The algorithm is embedded into the system and proven effective for real-time shortest path computations.

  • PDF

A Scheduling Method on Parallel Computation Models with Limited Number of Processors Using Genetic Algorithms (프로세서의 수가 한정되어있는 병렬계산모델에서 유전알고리즘을 이용한 스케쥴링해법)

  • 성기석;박지혁
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.2
    • /
    • pp.15-27
    • /
    • 1998
  • In the parallel processing systems, a compiler partitions a loaded program into tasks, allocates the tasks on multiple processors and schedules the tasks on each allocated processor. In this paper we suggest a Genetic Algorithm(GA) based scheduling method to find an optimal allocation and sequence of tasks on each Processor. The suggested method uses a chromosome which consists of task sequence and binary string that represent the number and order of tasks on each processor respectively. Two correction algorithms are used to maintain precedency constraints of the tasks in the chromosome. This scheduling method determines the optimal number of processors within limited numbers, and then finds the optimal schedule for each processor. A result from computational experiment of the suggested method is given.

  • PDF