• Title/Summary/Keyword: multi-core processing

Search Result 218, Processing Time 0.023 seconds

Lane Detection using Embedded Multi-core Platform (임베디드 멀티코어 플랫폼을 이용한 차선검출)

  • Lee, Kwang-Yeob;Kim, Dong-Han;Park, Tae-Ryoung
    • Journal of IKEEE
    • /
    • v.15 no.3
    • /
    • pp.255-260
    • /
    • 2011
  • In this paper, we propose a parallelization technique in lane detection by using Hough transform. Hough transform has a weakness that it has a lot computation quantity, because it has to compute ${\rho}$ value in all candidate ${\Theta}$ to be detected in an image. We propose an architecture of parallel processing for this transform in a multi-core environment. The parallel processing has application to Hough transform as well as noise reduction and edge detection. This proposed architecture has 5.17 times improvement in performance compare to the existing algorithm.

Efficient Process Network Implementation of Ray-Tracing Application on Heterogeneous Multi-Core Systems

  • Jung, Hyeonseok;Yang, Hoeseok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.289-293
    • /
    • 2016
  • As more mobile devices are equipped with multi-core CPUs and are required to execute many compute-intensive multimedia applications, it is important to optimize the systems, considering the underlying parallel hardware architecture. In this paper, we implement and optimize ray-tracing application tailored to a given mobile computing platform with multiple heterogeneous processing elements. In this paper, a lightweight ray-tracing application is specified and implemented in Kahn process network (KPN) model-of-computation, which is known to be suitable for the description of real-time applications. We take an open-source C/C++ implementation of ray-tracing and adapt it to KPN description in the Distributed Application Layer framework. Then, several possible configurations are evaluated in the target mobile computing platform (Exynos 5422), where eight heterogeneous ARM cores are integrated. We derive the optimal degree of parallelism and a suitable distribution of the replicated tasks tailored to the target architecture.

Improved Disparity Map Computation on Stereoscopic Streaming Video with Multi-core Parallel Implementation

  • Kim, Cheong Ghil;Choi, Yong Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.2
    • /
    • pp.728-741
    • /
    • 2015
  • Stereo vision has become an important technical issue in the field of 3D imaging, machine vision, robotics, image analysis, and so on. The depth map extraction from stereo video is a key technology of stereoscopic 3D video requiring stereo correspondence algorithms. This is the matching process of the similarity measure for each disparity value, followed by an aggregation and optimization step. Since it requires a lot of computational power, there are significant speed-performance advantages when exploiting parallel processing available on processors. In this situation, multi-core CPU may allow many parallel programming technologies to be realized in users computing devices. This paper proposes parallel implementations for calculating disparity map using a shared memory programming and exploiting the streaming SIMD extension technology. By doing so, we can take advantage both of the hardware and software features of multi-core processor. For the performance evaluation, we implemented a parallel SAD algorithm with OpenMP and SSE2. Their processing speeds are compared with non parallel version on stereoscopic streaming video. The experimental results show that both technologies have a significant effect on the performance and achieve great improvements on processing speed.

Face Detection using Skin Color Information and Parallel Processing Method on Multi-Core (멀티코어에서 피부색상 정보와 병렬처리 방법을 이용한 얼굴 검출)

  • Kim, Hong-Hee;Lee, Jae-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.219-222
    • /
    • 2012
  • 최근 얼굴검출에 관한 연구는 FPGA를 통한 H/W설계부터 DSP, GPU, ARM Core에 효율적인 S/W 설계까지 다양하게 연구되고 있다. 본 연구에서는 Multi-Core에 효과적인 얼굴검출 방법을 제안한다. 피부색을 통한 얼굴 후보를 추출하고 그 외의 배경 이미지는 삭제하여 연산처리를 빠르게 하였다. Viola-Jones가 제안한 얼굴검출 알고리즘을 POSIX Thread를 사용하여 병렬 처리하였고 그 성능을 단일 코어와 멀티코어에서 측정하였다. 단일 코어에서는 성능의 향상이 없었으나 멀티코어에서는 약 1.8배 속도가 향상되었고 검출 성공률은 기존과 동일하였다.

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Design Technique and Application for Distributed Recovery Block Using the Partitioning Operating System Based on Multi-Core System (멀티코어 기반 파티셔닝 운영체제를 이용한 분산 복구 블록 설계 기법 및 응용)

  • Park, Hansol
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.357-365
    • /
    • 2015
  • Recently, embedded systems such as aircraft and automobilie, are developed as modular architecture instead of federated architecture because of SWaP(Size, Weight and Power) issues. In addition, partition operating system that support multiple logical node based on partition concept were recently appeared. Distributed recovery block is fault tolerance design scheme that applicable to mission critical real-time system to support real-time take over via real-time synchronization between participated nodes. Because of real-time synchronization, single-core based computer is not suitable for partition based distributed recovery block design scheme. Multi-core and AMP(Asymmetric Multi-Processing) based partition architecture is required to apply distributed recovery block design scheme. In this paper, we proposed design scheme of distributed recovery block on the multi-core based supervised-AMP architecture partition operating system. This paper implements flight control simulator for avionics to check feasibility of our design scheme.

Efficient Hardware Transactional Memory Scheme for Processing Transactions in Multi-core In-Memory Environment (멀티코어 인메모리 환경에서 트랜잭션을 처리하기 위한 효율적인 HTM 기법)

  • Jang, Yeonwoo;Kang, Moonhwan;Yoon, Min;Chang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.466-472
    • /
    • 2017
  • Hardware Transactional Memory (HTM) has greatly changed the parallel programming paradigm for transaction processing. Since Intel has recently proposed Transactional Synchronization Extension (TSX), a number of studies based on HTM have been conducted. However, the existing studies support conflict prediction for a single cause of the transaction processing and provide a standardized TSX environment for all workloads. To solve the problems, we propose an efficient hardware transactional memory scheme for processing transactions in multi-core in-memory environment. First, the proposed scheme determines whether to use Software Transactional Memory (STM) or the serial execution as a fallback path of HTM by using a prediction matrix to collect the information of previously executed transactions. Second, the proposed scheme performs efficient transaction processing according to the characteristic of a given workload by providing a retry policy based on machine learning algorithms. Finally, through the experimental performance evaluation using Stanford transactional applications for multi-processing (STAMP), the proposed scheme shows 10~20% better performance than the existing schemes.

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

  • Lee, Sang-Pil;Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.159-174
    • /
    • 2012
  • This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.

Heterogeneous multi-core simulator based on SMP for the efficient application development at the heterogenous multi-core environment (효과적인 이기종 다중코어 응용 개발을 위한 SMP기반 이기종 다중코어 시뮬레이터)

  • SaKong, June;Shin, Dongha
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.3
    • /
    • pp.111-117
    • /
    • 2018
  • Heterogeneous multi-core environment integrated with different functional cores is the powerful tool for the embedded system that became more complex and diverse. Specialized application requires one chip solution with different operating system over different cores. But this heterogeneity causes difficult configuration of the development environment, makes hard to develop and test software. We show the environment of heterogeneous multi-core processing can be mapped to symmetric multi-core environment. We construct Linux based RPMsg for the data exchange between processes similar with the heterogeneous multi-core RPMsg and experiment that the proposed environment can be used to reduce the steps of the heterogeneous multi-core application development. With this simplification, we suggest simulation method for easy development and debugging the heterogeneous multicore environment that makes complex steps to simple.

OPENMP PARALLEL PERFORMANCE OF A CFD CODE ON MULTI-CORE SYSTEMS (멀티코어 시스템에서 쓰레드 수에 따른 CFD 코드의 OpenMP 병렬 성능)

  • Kim, J.K.;Jang, K.J.;Kim, T.Y.;Cho, D.R.;Kim, S.D.;Choi, J.Y.
    • Journal of computational fluids engineering
    • /
    • v.18 no.1
    • /
    • pp.83-90
    • /
    • 2013
  • OpenMP is becoming more and more useful as a simple parallel processing paradigm on SMP (Shared Memory Multi-Processors) computing environment with the development of multi-core processors. However, very few data is available publically regarding the OpenMP performance in CFD (Computational Fluid Dynamics). In the present study a CFD test suite is prepared for the performance evaluation of OpenMP on various multi-core systems. The test suite is composed of two-dimensional numerical simulations for inviscid/viscous and reacting/non-reacting flows using three different levels of grid systems. One to five test runs were carried out on various systems from dual-core dual threads to 16-core 32-threads systems by changing the number of threads engaged for each test up to 80. The results exhibit some interesting results and the lessons learned from the tests would be quite helpful for the further use of OpenMP for CFD studies using multi-core processor systems.