• Title/Summary/Keyword: multi-core processing

Search Result 220, Processing Time 0.024 seconds

A SoC based on the Gaussian Pyramid (GP) for Embedded image Applications (임베디드 영상 응용을 위한 GP_SoC)

  • Lee, Bong-Kyu
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.3
    • /
    • pp.664-668
    • /
    • 2010
  • This paper presents a System-On-a-chip (SoC) for embedded image processing and pattern recognition applications that need Gaussian Pyramid structure. The system is fully implemented into Field-Programmable Gate Array (FPGA) based on the prototyping platform. The SoC consists of embedded processor core and a hardware accelerator for Gaussian Pyramid construction. The performance of the implementation is benchmarked against software implementations on different platforms.

Design and Implementation of Image-Pyramid

  • Lee, Bongkyu
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.7
    • /
    • pp.1154-1158
    • /
    • 2016
  • This paper presents a System-On-a-chip for embedded image processing applications that need Gaussian Pyramid structure. The system is fully implemented into Field-Programmable Gate Array (FPGA) based on the prototyping platform. The SoC consists of embedded processor core and a hardware accelerator for Gaussian Pyramid construction. The performance of the implementation is benchmarked against software implementations on different platforms.

A Multithreaded Processor Architecture for SDR

  • Glossner, John;Raja, Tanuj;Hokenek, Erdem;Moudgill, Mayan
    • Information and Communications Magazine
    • /
    • v.19 no.11
    • /
    • pp.70-84
    • /
    • 2002
  • In this paper we discuss a multi-threaded baseband Processor capable of executing all physical layer processing of high data rate communications systems completely in software. We discuss the enabling technology for a software defined radio approach and present results for GPRS. 802.11b, and 2Mbps WCDMA. All of these protocols can be executed in real-time on the SB9600 chip using the Sandblaster core.

An Efficient Parallelized Algorithm of SEED Block Cipher on Cell BE (CELL 프로세서를 이용한 SEED 블록 암호화 알고리즘의 효율적인 병렬화 기법)

  • Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • The KIPS Transactions:PartA
    • /
    • v.17A no.6
    • /
    • pp.275-280
    • /
    • 2010
  • In this paper, we discuss and propose an efficiently parallelized block cipher algorithm on the CELL BE processor. With considering the heterogeneous feature of the CELL BE architecture, we apply different encoding/decoding methods to PPE and SPE and improve the throughput. Our implementation was fully tested, with execution results showing achievement of high throughput, capable of supporting as high network speed as 2.59 Gbps. Compared to various parallel implementations on multi-core systems, our approach provides speedup of 1.34 in terms of encoding/decoding speed.

Parallel Processing of Multi-Core Processor and GPUs in Projection Step for Efficient Fluid Simulation (효율적인 유체 시뮬레이션을 위한 투영 단계에서의 멀티 코어 프로세서와 그래픽 프로세서의 병렬처리)

  • Kim, Sun-Tae;Jung, Hwi-Ryong;Hong, Jeong-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.48-54
    • /
    • 2013
  • In these days, the state-of-art technologies employ the heterogeneous parallelization of CPU and GPU for fluid simulations in the field of computer graphics. In this paper, we present a novel CPU-GPU parallel algorithm that solves projection step of fluid simulation more efficiently than existing sequential CPU-GPU processing. Fluid simulation that requires high computational resources can be carried out efficiently by the proposed method.

Full Search Equivalent Motion Estimation Algorithm for General-Purpose Multi-Core Architectures

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.12 no.3
    • /
    • pp.13-18
    • /
    • 2013
  • Motion estimation is a key technique of modern video processing that significantly improves the coding efficiency significantly by exploiting the temporal redundancy between successive frames. Thread-level parallelism is a promising method to accelerate the motion estimation process for multithreading general-purpose processors. In this paper, we propose a parallel motion estimation algorithm which parallelizes the motion search process of the current H.264/AVC encoder. The proposed algorithm is implemented using the OpenMP application programming interface (API) and can be easily integrated into the current encoder. The experimental results show that the proposed parallel algorithm can reduce the processing time of the motion estimation up to 65.08% without any penalty in the rate-distortion (RD) performance.

A study of workload consolidation considering NUMA affinity (NUMA affinity를 고려한 Workload Consolidation 연구)

  • Seo, Dongyou;Kim, Shin-gye;Choi, Chanho;Eom, Hyeonsang;Yeom, Heon Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.204-206
    • /
    • 2012
  • SMP(Symmetric Multi-Processing)는 Shared memory bus 를 사용함으로써 scalability 가 제한적이었다. 이런 SMP의 scalability 제한을 극복하기 위해 제안 된 것이 NUMA(Non Uniform Memory Access)이다. NUMA는 memory bus 를 CPU 별 local 하게 가지고 있어 자신이 가지는 memory 영역에 대해서는 다른 영역을 접근하는 것 보다 더 빠른 latency 를 가지는 구조이다. Local 한 memory 영역의 존재는 scalability를 높여 주었지만 서버 가상화 환경에서 VM을 동적으로 scheduling 을 하였을 때 VM의 page 가 실행되는 core 의 local 한 메모리 영역에 존재하지 않게 되면 remote access로 인해 local access보다 성능이 떨어진다. 이 논문에서는 서버 가상화 환경에서 최신 architecture인 AMD bulldozer에서 NUMA affinity가 위반되었을 때 발생하는 성능 저하와 어떤 상황에서 이런 NUMA affinity가 위반되어도 성능저하가 없는지 연구하였다.

Low Cost Hardware Engine of Atomic Pipeline Broadcast Based on Processing Node Status (프로세서 노드 상황을 고려하는 저비용 파이프라인 브로드캐스트 하드웨어 엔진)

  • Park, Jongsu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.8
    • /
    • pp.1109-1112
    • /
    • 2020
  • This paper presents a low cost hardware message passing engine of enhanced atomic pipelined broadcast based on processing node status. In this algorithm, the previous atomic pipelined broadcast algorithm is modified to reduce the waiting time until next broadcast communication. For this, the processor change the transmission order of processing nodes based on the nodes' communication channel. Also, the hardware message passing engine architecture of the proposed algorithm is modified to be adopted to multi-core processor. The synthesized logic area of the proposed hardware message passing engine was reduced by about 16%, compared by the pre-existing hardware message passing engine.

Comparison study of CPU processing load by I/O processing method through use case analysis (유즈케이스를 통해 분석해 본 I/O 처리방식에 따르는 CPU처리 부하 비교연구)

  • Kim, JaeYoung
    • Journal of Aerospace System Engineering
    • /
    • v.13 no.5
    • /
    • pp.57-64
    • /
    • 2019
  • Recently, avionics systems are being developed as integrated modular architecture applying the modular integration design of the functional unit to reduce maintenance costs and increase operating performance. Additionally, a partitioning operating system based on virtualization technology was used to process various mission control functions. In virtualization technology, the CPU processing load distribution is a key consideration. Especially, the uncertainty of the I/O processing time is a risk factor in the design of reliable avionics systems. In this paper, we examine the influence of the I/O processing method by comparing and analyzing the CPU processing load by the I/O processing method through use of case analysis and applying it to the example of spatial-temporal partitioning.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.