• Title/Summary/Keyword: Parallel processor

Search Result 482, Processing Time 0.03 seconds

Real-Time Implementation of the Relative Position Estimation Algorithm Using the Aerial Image Sequence (항공영상에서 상대 위치 추정 알고리듬의 실시간 구현)

  • Park, Jae-Hong;Kim, Gwan-Seok;Kim, In-Cheol;Park, Rae-Hong;Lee, Sang-Uk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.3
    • /
    • pp.66-77
    • /
    • 2002
  • This paper deals with an implementation of the navigation parameter extraction technique using the TMS320C80 multimedia video processor (MVP). Especially, this Paper focuses on the relative position estimation algorithm which plays an important role in real-time operation of the overall system. Based on the relative position estimation algorithm using the images obtained at two locations, we develop a fast algorithm that can reduce large amount of computation time and fit into fixed-point processors. Then, the algorithm is reconfigured for parallel processing using the 4 parallel processors in the MVP. As a result, we shall demonstrate that the navigation parameter extraction system employing the MVP can operate at full-frame rate, satisfying real-time requirement of the overall system.

Design and Performance Analysis of a Parallel Optimal Branch-and-Bound Algorithm for MIN-based Multiprocessors (MIN-based 다중 처리 시스템을 위한 효율적인 병렬 Branch-and-Bound 알고리즘 설계 및 성능 분석)

  • Yang, Myung-Kook
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.31-46
    • /
    • 1997
  • In this paper, a parallel Optimal Best-First search Branch-and-Bound(B&B) algorithm(pobs) is designed and evaluated for MIN-based multiprocessor systems. The proposed algorithm decomposes a problem into G subproblems, where each subproblem is processed on a group of P processors. Each processor group uses tile sub-Global Best-First search technique to find a local solution. The local solutions are broadcasted through the network to compute the global solution. This broadcast provides not only the comparison of G local solutions but also the load balancing among the processor groups. A performance analysis is then conducted to estimate the speed-up of the proposed parallel B&B algorithm. The analytical model is developed based on the probabilistic properties of the B&B algorithm. It considers both the computation time and communication overheads to evaluate the realistic performance of the algorithm under the parallel processing environment. In order to validate the proposed evaluation model, the simulation of the parallel B&B algorithm on a MIN-based system is carried out at the same time. The results from both analysis and simulation match closely. It is also shown that the proposed Optimal Best-First search B&B algorithm performs better than other reported schemes with its various advantageous features such as: less subproblem evaluations, prefer load balancing, and limited scope of remote communication.

  • PDF

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

A PARALLEL FINITE ELEMENT ALGORITHM FOR SIMULATION OF THE GENERALIZED STOKES PROBLEM

  • Shang, Yueqiang
    • Bulletin of the Korean Mathematical Society
    • /
    • v.53 no.3
    • /
    • pp.853-874
    • /
    • 2016
  • Based on a particular overlapping domain decomposition technique, a parallel finite element discretization algorithm for the generalized Stokes equations is proposed and investigated. In this algorithm, each processor computes a local approximate solution in its own subdomain by solving a global problem on a mesh that is fine around its own subdomain and coarse elsewhere, and hence avoids communication with other processors in the process of computations. This algorithm has low communication complexity. It only requires the application of an existing sequential solver on the global meshes associated with each subdomain, and hence can reuse existing sequential software. Numerical results are given to demonstrate the effectiveness of the parallel algorithm.

A Cooperative Parallel Tabu Search and Its Experimental Evaluation

  • Matsumura, Takashi;Nakamura, Morikazu;Tamaki, Shiro;Onaga, Kenji
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.245-248
    • /
    • 2000
  • This paper proposes a cooperative parallel tabu search which incorporates with the historical information exchange among processors in addition to its own searching of each processor. We investigate the influence of our proposed cooperative parallel tabu search by comparison with a serial tabu search. We also propose two extensions of the cooperative parallel tabu search which are the cooperative construction of tabu memory and the selection of cooperative partner. Through computational experiment, we observe the improvement of solutions by our proposed method.

  • PDF

A Study on the Pixel-Parallel Usage Processing Using the Format Converter (포맷 변환기를 이용한 화소-병렬 화상처리에 관한 연구)

  • Kim, Hyeon-Gi;Lee, Cheon-Hui
    • The KIPS Transactions:PartA
    • /
    • v.9A no.2
    • /
    • pp.259-266
    • /
    • 2002
  • In this paper we implemented various image processing filtering using the format converter. This design method is based on realized the large processor-per-pixel array by integrated circuit technology. These two types of integrated structure are can be classify associative parallel processor and parallel process DRAM (or SRAM) cell. Layout pitch of one-bit-wide logic is Identical memory cell pitch to array high density PEs in integrate structure. This format converter design has control path implementation efficiently, and can be utilize the high technology without complicated controller hardware. Sequence of array instruction are generated by host computer before process start, and instructions are saved on unit controller. Host computer is executed the pixel-parallel operation starting at saved instructions after processing start. As a result, we obtained three result that 1) simple smoothing suppresses higher spatial frequencies, reducing noise but also blurring edges, 2) a smoothing and segmentation process reduces noise while preserving sharp edges, and 3) median filtering may be applied to reduce image noise. Median filtering eliminates spikes while maintaining sharp edges and preserving monotonic variations in pixel values.

Four Consistency Levels in Trigger Processing (트리거 처리 4 단계 일관성 레벨)

  • ;Eric Hanson
    • Journal of KIISE:Databases
    • /
    • v.29 no.6
    • /
    • pp.492-501
    • /
    • 2002
  • An asynchronous trigger processor (ATP) is a oftware system that processes triggers after update transactions to databases are complete. In an ATP, discrimination networks are used to check the trigger conditions efficiently. Discrimination networks store their internal states in memory nodes. TriggerMan is an ATP and uses Gator network as the .discrimination network. The changes in databases are delivered to TriggerMan in the form of tokens. Processing tokens against a Gator network updates the memory nodes of the network and checks the condition of a trigger for which the network is built. Parallel token processing is one of the methods that can improve the system performance. However, uncontrolled parallel processing breaks trigger processing semantic consistency. In this paper, we propose four trigger processing consistency levels that allow parallel token processing with minimal anomalies. For each consistency level, a parallel token processing technique is developed. The techniques are proven to be valid and are also applicable to materialized view maintenance.

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

  • Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1060-1065
    • /
    • 2016
  • Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.

Real-time Parallel Processing Simulator for Modeling Portable Missile System and Performance Analysis (휴대용 유도탄 체계의 모델링과 성능분석을 위한 실시간 병렬처리 시뮬레이터)

  • Kim Byeong-Moon;Jung Soon-Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.4 s.42
    • /
    • pp.35-45
    • /
    • 2006
  • RIn this paper. we describe real-time parallel processing simulator developed for the use of performance analysis of rolling missiles. The real-time parallel processing simulator developed here consists of seeker emulator generating infrared image signal on aircraft, real-time computer, host computer, system unit, and actual equipments such as auto-pilot processor and seeker processor. Software is developed according to the design requirements of mathematic model, 6 degree-of-freedom module, aerodynamic module which are resided in real-time computer. and graphic user interface program resided in host computer. The real-time computer consists of six TI C-40 processors connected in parallel. The seeker emulator is designed by using analog circuits coupled with mechanical equipments. The system unit provides interface function to match impedance between the components and processes very small electrical signals. Also real launch unit of missiles is interfaced to simulator through system unit. In order to use the real-time parallel processing simulator developed here as a performance analysis equipment for rolling missiles, we perform verification test through experimental results in the field.

  • PDF

Study of Parallel Network Processor using Global Cache (글로벌 캐시를 이용한 네트워크 병렬 프로세서 구조 연구)

  • Park, Jae-Won;Chung, Won-Young;Kim, Hyun-Pil;Lee, Jung-Hee;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.80-85
    • /
    • 2011
  • The mount of network traffic from the Internet is increasing because of the use of Broadband Convergence Networks(BcN). Network traffic is also increasing because of the development of application, especially multimedia traffic from IPTV, VOD, and online games. This multimedia traffic not only has a huge payload but also should be considered a threat in real time. For this reason, this study examines the ways that routers distribute the bandwidth in accordance to traffic properties. To classify the property of the traffic, it is essential to analyze the application layer. However, the general network processor architecture serially processes the L2-4 and L7 layer. We propose a novel parallel network processor architecture with a global cache that processes L2-4 and L7 in parallel. To verify the proposed architecture, we simulated both of the architecture with SystemC. EEMBC and SNORT was used to measure L2-4 and L7 processing time. When multimedia traffic was entered into the network processor in the same flow, the proposed architecture showed about 85% higher performance than general architecture.