• Title/Summary/Keyword: application-specific processor

Search Result 75, Processing Time 0.025 seconds

Construction of a Compiled-code Simulator Generation System for Efficient Design Exploration in Embedded Core Design (임베디드 코어 설계시 효율적인 설계 공간 탐색을 위한 컴파일드 코드 방식 시뮬레이터 생성 시스템 구축)

  • Kim, Sang-Woo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.71-79
    • /
    • 2011
  • This paper proposes a compiled-code simulator generation system based-on machine description language for efficient design space exploration in designing an embedded system optimized for a specific application. The proposed system generates a compiled-code simulator which maintains the functional accuracy of an event-driven simulator by determining instruction fetch and decoding processes statically. Generated simulator takes instruction-level and cycle-level simulation for estimating performances in embedded core. To show the efficiency of the constructed compiled-code simulator generator, architecture exploration had been performed for the JPEG encoder application. Starting with MIPS R3000 processor for one embedded core, the proposed system can produce the core showing optimized execution time for the application programming. In this process, a huge amount of simulation time has been used. Cycle-level compiled-code simulator has the functional accuracy and shows performance improvement by 21.7% in terms of simulation speed on the average when compared with an event-driven simulator.

Design of High-speed H.264/AVC Parallel Decoder Using ASIP Approach (ASIP 기술을 활용한 H.264/AVC 고속 병렬 복호화기 설계)

  • Ji, Bong-Il;Sim, Dong-Gyu;Kim, Kyung-Su;Park, Seong-Mo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.11a
    • /
    • pp.251-254
    • /
    • 2009
  • 본 논문에서는 고해상도 동영상의 실시간 복호화를 위하여 Application Specific Instruction-set Processor (ASIP)기술을 이용하여 H.264/AVC 고속 병렬 복호화기를 설계하였다. 우선, 하드웨어에 최적화된 구조로 복호화기를 설계하고 LISA로 기술한 멀티미디어 전용 명령어를 명령어 집합에 추가하였다. 이렇게 설계한 고속 H.264/AVC 복호화기는 사이클 기반 시뮬레이터에서 성능을 측정한 결과 기존 대비 약 35%의 복호화 사이클 감소를 보였다. 추가적인 성능 향상을 위해, 앞서 설계한 고속복호화기를 여러 개 사용하여 병렬 H.264/AVC 복호화기를 설계하였다. 병렬 복호화기는 여러 매크로블록을 동시에 복호화 처리함으로써 복호화기의 성능을 대폭 향상시켰다. 병렬 복호화기는 고속 복호화기 대비 약 75%의 복호화 사이클이 감소하였다. 이에 고해상도 동영상의 실시간 복호화를 위한 H.264/AVC 고속 병렬 복호화기의 설계 방법을 제시하고자 한다.

  • PDF

Construction of a Retargetable Compiler Generation System from Machine Behavioral Description (머쉰 행위기술로부터 Retargetable 컴파일러 생성시스템 구축)

  • Lee, Sung-Rae;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5B
    • /
    • pp.286-294
    • /
    • 2007
  • In ASIP design, compiler is required for performance evaluation of processors being designed. The design of machine specific compiler is time consuming. This paper presents the system which generates C compiler from MDL descriptions. Compiler generation using MDL can support user retargetability and concurrency between compiler design and processor design. However, it must overcome semantics gap between compiler and machine. To handle this problem, the proposed system maps behavioral descriptions to library which contains abstract behavior for each tree pattern. Using mapped instructions and information on register file usage, the proposed system generates back-end interface function of the compiler. Generated compilers, for MIPS R3000, ARM9 cores, have been proved by application programs written in C code.

Data Compression Algorithm for Efficient Data Transmission in Digital Optical Repeaters

  • Kim, Jae Wan;Eom, Doo Seop
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.12
    • /
    • pp.142-146
    • /
    • 2012
  • Today, the demand for high-speed data communication and mobile communication has exploded. Thus, there is a growing need for optical communication systems that convert large volumes of data to optical signals and that accommodate and transmit the signals across long distances. Digital optical communication with these characteristics consists of a master unit (MU) and a slave unit (SU). However, the digital optical units that are currently commercialized or being developed transmit data without compression. Thus, digital optical communication using these units is restricted by the quantity of optical frames when adding diversity or operating with various combinations of CDMA, WCDMA, WiBro, GSM, LTE, and other mobile communication technologies. This paper suggests the application of a data compression algorithm to a digital signal processor (DSP) chip as a field programmable gate array (FPGA) and a complex programmable logic device (CPLD) of a digital optical unit to add separate optical waves or to transmit complex data without specific changes in design of the optical frame.

Design and Implementation of Java Bytecode Translator usin Pattern Matching Technique (패턴 매칭 기법을 이용한 자바 바이트코드 변환기의 설계 및 구현)

  • Ko, Kwang-Man
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.4
    • /
    • pp.1-9
    • /
    • 2002
  • The various researches are investigated for translating Bytecode into native code which can be implemented in the specific processor using classical compiling methods to improve the execution speed of the Java application programs. The code generation techniques using pattern matching can generate more high-quality machine code than code expansion techniques. We provide, in this research, the standardized pattern describing methods and pattern matching techniques that can be used to generate the register-based intermediate code which is for the effective native code generation from Bytecode. And we designed and realized the intermediate code translator with which we can generate the high-quality register-based intermediate code using standardized pattern described formerly.

Method of data processing through polling and interrupt driven I/O on device data (디바이스 데이터 입출력에 있어서 폴링 방식과 인터럽트 구동 방식의 데이터 처리 방법)

  • Koo, Cheol-Hea
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.33 no.9
    • /
    • pp.113-119
    • /
    • 2005
  • The methods that are used for receiving data from attached devices under real-time preemptive multi-task operating system (OS) by general processors can be categorized as polling and interrupt driven. The technical approach to these methods may be different due to the application specific scheduling policy of the OS and the programming architecture of the flight software. It is one of the most important requirements on the development of the flight software to process the data received from satellite subsystems or components with the exact timeliness and accuracy. This paper presents the analysis of the I/O method of device related scheduling mechanism and the reliable data I/O methods between processor and devices.

Design of Chip Set for CDMA Mobile Station

  • Yeon, Kwang-Il;Yoo, Ha-Young;Kim, Kyung-Soo
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.228-241
    • /
    • 1997
  • In this paper, we present a design of modem and vocoder digital signal processor (DSP) chips for CDMA mobile station. The modem chip integrates CDMA reverse link modulator, CDMA forward link demodulator and Viterbi decoder. This chip contains 89,000 gates and 29 kbit RAMs, and the chip size is $10 mm{\times}10.1 mm$ which is fabricated using a $0.8{\mu}m$ 2 metal CMOs technology. To carry out the system-level simulation, models of the base station modulator, the fading channel, the automatic gain control loop, and the microcontroller were developed and interfaced with a gate-level description of the modem application specific integrated circuit (ASIC). The Modem chip is now successfully working in the real CDMA mobile station on its first fab-out. A new DSP architecture was designed to implement the Qualcomm code exited linear prediction (QCELP) vocoder algorithm in an efficient way. The 16 bit vocoder DSP chip has an architecture which supports direct and immediate addressing modes in one instruction cycle, combined with a RISC-type instruction set. This turns out to be effective for the implementation of vocoder algorithm in terms of performance and power consumption. The implementation of QCELP algorithm in our DSP requires only 28 million instruction per second (MIPS) of computation and 290 mW of power consumption. The DSP chip contains 32,000 gates, 32K ($2k{\times}16\;bit$) RAM, and 240k ($10k{\times}24\;bit$) ROM. The die size is $8.7\;mm{\times}8.3\;mm$ and chip is fabricated using $0.8\;{\mu}m$ CMOS technology.

  • PDF

A Study on Development of Program for an Automated Thixoforming Process Design (Thixoforming 공정설계 자동화를 위한 프로그램 개발에 관한 연구)

  • Kim, Nam-Seok;Jeong, Hong-Gyu;Gang, Chung-Gil
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.18 no.1
    • /
    • pp.44-55
    • /
    • 2001
  • The flow behavior of semi-solid materials (SSM) is required to assist the industrial application of thixoforming technology. Particularly, to reduce many lead times, many numerical analysis packages have been developed to simulate required metal forming processes. The objectives of the development of SEMI-FORM for thixoforming process design are to predict the effect of various process variables such as pressing force, die temperature, and die velocity. However, there have not been any reports which adapt these packages to the specific characteristics of SSM. SO, this paper presents an overview of the development of thixoforming simulator of SEMI-FORM. The solver and post-processor of SEMI-FORM S/W for an automated thixoforming process design with arbitrarily shaped die are composed of FORTRAN Power Station 4.0 and Visual Basic 5.0, respectively. This developing SEMI-FORM S/W would be very useful for thixoforming practitioners and engineers to select the optimal process conditions to produce automotive parts with high quality.

  • PDF

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

  • Hong, Seung-Pyo;Moon, Soo-Mook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.401-407
    • /
    • 2000
  • Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.

  • PDF