• Title/Summary/Keyword: Application-specific processor

Search Result 74, Processing Time 0.029 seconds

Performance Improvement of the programmable processor designed for H.264 on-chip encoder (H.264 on-chip encoder를 위한 programmable processor 성능 향상)

  • Lee, Jinyong;Kim, Kyungwon;Heo, Ingoo;Park, Sanghyun;Kim, Yongjoo;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.19-20
    • /
    • 2009
  • H.264 부호기의 on-chip 상의 구현방법으로는 성능에 중점을 둔 ASIC (application specific integrated circuit) 기반의 접근 방식과 ASIC 보다 성능은 떨어지나 일반성과 유연성에 중점을 둔 ASIP (application specific instruction set architecture) 기반의 설계 방식이 연구되어 왔다. 우리는 영상 압축 응용 범위 내에서는 일반성 및 유연성을 잃지 않으면서도 기존에 문제시 되던 ASIP의 성능은 대폭 개선할 수 있는 ISA와 micro architecture를 제안하고 구현한 바 있다. 본 논문의 핵심적인 기여는 이 ASIP의 추가적인 성능 개선이다.

An Efficient Architecture Exploration Method for Optimal ASIP Design (Application에 최적의 ASIP 설계를 위한 효율적인 Architecture Exploration 방법)

  • Lee, Sung-Rae;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.913-921
    • /
    • 2007
  • Retargetable compiler which generates executable code for a target processor and performance profiler are required to design a processor optimized for a specific application. This paper presents an architecture exploration methodology based on ADL (Architecture Description Language). We synthesized instruction set and optimized processor structure using information extracted from application program. The information of operation sequences executed frequently and register usage are used for processor optimization. Architecture exploration has been performed for JPEG encoder to show the effectiveness of the system. The ASIP designed using the proposed method shows 1.97 times better performance.

Techniques for special instruction generation for DSP ASIP (DSP영 ASIP을 위한 특수 명령어 생성 기법)

  • 김홍철;황승호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.1-10
    • /
    • 1998
  • The first thing in designing application-specific instruction set processor is having instruction set closely matching hardware characteristics. This instruction set design problem can be more complicated when cobined with implementation method selection problem of each instruction. Our processor model supports two kinds of instructions-primitive or special instructions. Primitive instructions are implemented using common multifunctional hardware such as ALU. Special instructions require a set of dedicated hardware, which actually functions as a coprocessor to the main processor. In this case, special instructions and primitive instructions can be executed independently. In this paper, we present novel algorithm for genrating special instructions for given application. Parallelism between special instructions and primitive instructions is also considered during the performance estimation stage of generated special instructions.

  • PDF

Efficient Loop Accelerator for Motion Estimation Specific Instruction-set Processor (움직임 추정 전용 프로세서를 위한 효율적인 루프 가속기)

  • Ha, Jae Myung;Jung, Ho Sun;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.159-166
    • /
    • 2013
  • This paper proposes an efficient loop accelerator for a motion estimation specific instruction-set processor. ME algorithms in nature contain complex and multiple loop operations. To support efficient hardware (HW) loop operations, this paper introduces four loop instructions and their specific HW architecture. The simulation results show that the proposed loop accelerator can reduce about 29% average instruction cycles for ME early-termination schemes compared with typical implementation having a combination of compare and conditional jump instructions. The proposed loop accelerator of the motion estimation specific instruction-set processor can significantly reduce the number of program memory accesses and greatly save power consumption. Hence, it can be quite suitable for low power and flexible ME implementation.

Application of Multi Parallel GAP to Rotation-Invariant Pattern Recognition (Multi Parallel GAP(Genetic Algorithm Processor)를 이용한 회전 불변 패턴 인식에의 응용)

  • 조민석;허인수;이주환;정덕진
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.29-32
    • /
    • 2001
  • In this paper, we applied the high-performance PGAP(Parallel Genetic Algorithm Processor) to recognizing rotated pattern. In order to perform this research efficiently, we used Multi-PGAP system consisted of four PGAP. In addition, we used mental rotation based on the rotated pattern recognition mechanism of human to reduce the number of operation. Also, we experimented with distinguishing specific pattern from similar coin patterns and determine rotated angle between patterns. The result showed that the development of future artificial recognition system is feasible by employing high performance PGAPS.

  • PDF

Code Generation and Optimization for the Flow-based Network Processor based on LLVM

  • Lee, SangHee;Lee, Hokyoon;Kim, Seon Wook;Heo, Hwanjo;Park, Jongdae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.42-45
    • /
    • 2012
  • A network processor (NP) is an application-specific instruction-set processor for fast and efficient packet processing. There are many issues in compiler's code generation and optimization due to NP's hardware constraints and special hardware support. In this paper, we describe in detail how to resolve the issues. Our compiler was developed on LLVM 3.0 and the NP target was our in-house network processor which consists of 32 64-bit RISC processors and supports multi-context with special hardware structures. Our compiler incurs only 9.36% code size overhead over hand-written code while satisfying QoS, and the generated code was tested on a real packet processing hardware, called S20 for code verification and performance evaluation.

Design and Optimization of Mu1ti-codec Video Decoder using ASIP (ASIP를 이용한 다중 비디오 복호화기 설계 및 최적화)

  • Ahn, Yong-Jo;Kang, Dae-Beom;Jo, Hyun-Ho;Ji, Bong-Il;Sim, Dong-Gyu;Eum, Nak-Woong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.116-126
    • /
    • 2011
  • In this paper, we present a multi-media processor which can decode multiple-format video standards. The designed processor is evaluated with optimized MPEG-2, MPEG-4, and AVS (Audio video standard). There are two approaches for developing of real-time video decoders. First, hardware-based system is much superior to a processor-based one in execution time. However, it takes long time to implement and modify hardware systems. On the contrary, the software-based video codecs can be easily implemented and flexible, however, their performance is not so good for real-time applications. In this paper, in order to exploit benefits related to two approaches, we designed a processor called ASIP(Application specific instruction-set processor) for video decoding. In our work, we extracted eight common modules from various video decoders, and added several multimedia instructions to the processor. The developed processor for video decoders is evaluated with the Synopsys platform simulator and a FPGA board. In our experiment, we can achieve about 37% time saving in total decoding time.

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Implemenation of an ASIP for acceleration SAD operation (SAD 연산의 가속을 위한 멀티미디어 코프로세서 구현)

  • Jo, Jung-Hyun;Jeong, Ha-Young
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.809-810
    • /
    • 2006
  • An H.264 algorithm is commonly used for video compression applications. This algorithm requires a large number of data computations, for example, the sum of absolute difference (SAD) operation. We analyzed H.264 reference encoding workloads. The H.264 encoding program has 8.78% SAD operation. The SAD operation is to sum up 16 difference-values in H.264 $4{\times}4$ sub-blocks. In order to accelerate SAD operations, we implemented an application specific instruction-set processor (ASIP) that can execute SAD and data transfer instructions. The proposed coprocessor has an absolute value generator and a carry save adder (CSA) unit to sum up 8 difference-values per one clock cycle. We completed SAD operation in 2 clock cycles. Experimental results show that the performance is improved by 34% of total execution time.

  • PDF