• Title/Summary/Keyword: application-specific instruction

Search Result 63, Processing Time 0.032 seconds

Design of Chip Set for CDMA Mobile Station

  • Yeon, Kwang-Il;Yoo, Ha-Young;Kim, Kyung-Soo
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.228-241
    • /
    • 1997
  • In this paper, we present a design of modem and vocoder digital signal processor (DSP) chips for CDMA mobile station. The modem chip integrates CDMA reverse link modulator, CDMA forward link demodulator and Viterbi decoder. This chip contains 89,000 gates and 29 kbit RAMs, and the chip size is $10 mm{\times}10.1 mm$ which is fabricated using a $0.8{\mu}m$ 2 metal CMOs technology. To carry out the system-level simulation, models of the base station modulator, the fading channel, the automatic gain control loop, and the microcontroller were developed and interfaced with a gate-level description of the modem application specific integrated circuit (ASIC). The Modem chip is now successfully working in the real CDMA mobile station on its first fab-out. A new DSP architecture was designed to implement the Qualcomm code exited linear prediction (QCELP) vocoder algorithm in an efficient way. The 16 bit vocoder DSP chip has an architecture which supports direct and immediate addressing modes in one instruction cycle, combined with a RISC-type instruction set. This turns out to be effective for the implementation of vocoder algorithm in terms of performance and power consumption. The implementation of QCELP algorithm in our DSP requires only 28 million instruction per second (MIPS) of computation and 290 mW of power consumption. The DSP chip contains 32,000 gates, 32K ($2k{\times}16\;bit$) RAM, and 240k ($10k{\times}24\;bit$) ROM. The die size is $8.7\;mm{\times}8.3\;mm$ and chip is fabricated using $0.8\;{\mu}m$ CMOS technology.

  • PDF

Instruction Fine-tuning and LoRA Combined Approach for Optimizing Large Language Models (대규모 언어 모델의 최적화를 위한 지시형 미세 조정과 LoRA 결합 접근법)

  • Sang-Gook Kim;Kyungran Noh;Hyuk Hahn;Boong Kee Choi
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.134-146
    • /
    • 2024
  • This study introduces and experimentally validates a novel approach that combines Instruction fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning to optimize the performance of Large Language Models (LLMs). These models have become revolutionary tools in natural language processing, showing remarkable performance across diverse application areas. However, optimizing their performance for specific domains necessitates fine-tuning of the base models (FMs), which is often limited by challenges such as data complexity and resource costs. The proposed approach aims to overcome these limitations by enhancing the performance of LLMs, particularly in the analysis precision and efficiency of national Research and Development (R&D) data. The study provides theoretical foundations and technical implementations of Instruction fine-tuning and LoRA fine-tuning. Through rigorous experimental validation, it is demonstrated that the proposed method significantly improves the precision and efficiency of data analysis, outperforming traditional fine-tuning methods. This enhancement is not only beneficial for national R&D data but also suggests potential applicability in various other data-centric domains, such as medical data analysis, financial forecasting, and educational assessments. The findings highlight the method's broad utility and significant contribution to advancing data analysis techniques in specialized knowledge domains, offering new possibilities for leveraging LLMs in complex and resource-intensive tasks. This research underscores the transformative potential of combining Instruction fine-tuning with LoRA fine-tuning to achieve superior performance in diverse applications, paving the way for more efficient and effective utilization of LLMs in both academic and industrial settings.

Motion Estimation Specific Instructions and Their Hardware Architecture for ASIP (ASIP을 위한 움직임 추정 전용 연산기 구조 및 명령어 설계)

  • Hwang, Sung-Jo;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.3
    • /
    • pp.106-111
    • /
    • 2011
  • This paper presents an ASIP (Application-specific Instruction Processor) for motion estimation that employs specific IME instructions and its programmable and reconfigurable hardware architecture for various video codecs, such as H.264/AVC, MPEG4, etc. With the proposed specific instructions and hardware accelerator, it can handle the real-time processing requirement of High Definition (HD) video. With the parallel operations and SAD unit control using pattern information, the proposed IME instruction supports not only full search algorithm but also other fast search algorithms. The hardware size is 77K gates for each Processing Element Group (PEG) which has 256 SAD PEs. The proposed ASIP runs at 160MHz with sixteen PEGs and it can handle 1080p@30 frame in real time.

A 16 bit FPGA Microprocessor for Embedded Applications (실장제어 16 비트 FPGA 마이크로프로세서)

  • 차영호;조경연;최혁환
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.7
    • /
    • pp.1332-1339
    • /
    • 2001
  • SoC(System on Chip) technology is widely used in the field of embedded systems by providing high flexibility for a specific application domain. An important aspect of development any new embedded system is verification which usually requires lengthy software and hardware co-design. To reduce development cost of design effort, the instruction set of microprocessor must be suitable for a high level language compiler. And FPGA prototype system could be derived and tested for design verification. In this paper, we propose a 16 bit FPGA microprocessor, which is tentatively-named EISC16, based on an EISC(Extendable Instruction Set Computer) architecture for embedded applications. The proposed EISC16 has a 16 bit fixed length instruction set which has the short length offset and small immediate operand. A 16 bit offset and immediate operand could be extended using by an extension register and an extension flag. We developed a cross C/C++ compiler and development software of the EISC16 by porting GNU on an IBM-PC and SUN workstation and compared the object code size created after compiling a C/C. standard library, concluding that EISC16 exhibits a higher code density than existing 16 microprocessors. The proposed EISC16 requires approximately 6,000 gates when designed and synthesized with RTL level VHDL at Xilinix's Virtex XCV300 FPGA. And we design a test board which consists of EISC16 ROM, RAM, LED/LCD panel, periodic timer, input key pad and RS-232C controller. 11 works normally at 7MHz Clock.

  • PDF

A Power Estimation Method for ASIPs Considering Data Types of Variables in Application Programs

  • Kim, Tsutomu ura;Shibahara, Shin-ichi;Yoshinori Takeuchi;Masaharu Imai;Akira Kitajima;Michiaki Muraoka
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.387-390
    • /
    • 2000
  • This paper proposes an efficient and accurate power estimation method for Application Specific Instruction set Processors (ASIPs). Proposed method takes advantage of the data types of variables in application program to be executed on the ASIP. According to the experimental results, the efficiency of proposed method was more than 1000 times as high as that of conventional RTL based power estimation method, and the estimation error was within 10% compared to a conventional gate-level accurate power estimation method

  • PDF

Development and Application of the CPS Instructional Program for the Astronomy Section of Highschool Science (고등학교 과학 천문분야의 CPS수업프로그램 개발 및 적용)

  • Shin, Seon-Young;Kim, Soon-Shik;Choi, Gwang-Sun;Choi, Sung-Bong
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.4 no.2
    • /
    • pp.108-115
    • /
    • 2011
  • The purpose of this study was to develop and apply a creative problem-solving(CPS) program of instruction for earth science. After the earth science sections of high school science textbooks were analyzed, a theme of instruction was selected from the first-year unit 'the origin and evolution of the universe', and a CPS model of instruction. 32 high school sophomores and juniors who were the members of an astronomy club in the city of Gimhae, South Gyeongsang Province, participated in the program, and they took a test in scientific creative problem-solving skills before and after the experiment to grasp the effect of the program on their creative problem-solving skills. Besides, a survey was conducted to find out their awareness of the program. As a result of implementing the CPS program based on the CPS model of instructional for the unit 'the origin and evolution of the universe', the program turned out effective at boosting the scientific creative problem-solving skills of the students. To be specific, they made a significant progress in validity and scientificity, but that's not the case for elaboration and originality. When their awareness of the CPS program was checked, they expected the program to spark their interest in astronomy and be beneficial to the improvement of their creative problem-solving skills, but they didn't rate group activities high on the ground that the group activities weren't performed smoothly. The findings of the study suggest that the CPS instructional program for the unit 'the origin and evolution of the universe' based on the CPS model of instruction had a good effect on the improvement of the scientific creative problem-solving skills of the students.

Core-A: A 32-bit Synthesizable Processor Core

  • Kim, Ji-Hoon;Lee, Jong-Yeol;Ki, Ando
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.83-88
    • /
    • 2015
  • Core-A is 32-bit synthesizable processor core with a unique instruction set architecture (ISA). In this paper, the Core-A ISA is introduced with discussion of useful features and the development environment, including the software tool chain and hardware on-chip debugger. Core-A is described using Verilog-HDL and can be customized for a given application and synthesized for an application-specific integrated circuit or field-programmable gate array target. Also, the GNU Compiler Collection has been ported to support Core-A, and various predesigned platforms are well equipped with the established design flow to speed up the hardware/software co-design for a Core-A-based system.

Construction of a Compiled-code Simulator Generation System for Efficient Design Exploration in Embedded Core Design (임베디드 코어 설계시 효율적인 설계 공간 탐색을 위한 컴파일드 코드 방식 시뮬레이터 생성 시스템 구축)

  • Kim, Sang-Woo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.71-79
    • /
    • 2011
  • This paper proposes a compiled-code simulator generation system based-on machine description language for efficient design space exploration in designing an embedded system optimized for a specific application. The proposed system generates a compiled-code simulator which maintains the functional accuracy of an event-driven simulator by determining instruction fetch and decoding processes statically. Generated simulator takes instruction-level and cycle-level simulation for estimating performances in embedded core. To show the efficiency of the constructed compiled-code simulator generator, architecture exploration had been performed for the JPEG encoder application. Starting with MIPS R3000 processor for one embedded core, the proposed system can produce the core showing optimized execution time for the application programming. In this process, a huge amount of simulation time has been used. Cycle-level compiled-code simulator has the functional accuracy and shows performance improvement by 21.7% in terms of simulation speed on the average when compared with an event-driven simulator.

Dual-mode Pseudorandom Number Generator Extension for Embedded System (임베디드 시스템에 적합한 듀얼 모드 의사 난수 생성 확장 모듈의 설계)

  • Lee, Suk-Han;Hur, Won;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.95-101
    • /
    • 2009
  • Random numbers are used in many sorts of applications. Some applications, like simple software simulation tests, communication protocol verifications, cryptography verification and so forth, need various levels of randomness with various process speeds. In this paper, we propose a fast pseudorandom generator module for embedded systems. The generator module is implemented in hardware which can run in two modes, one of which can generate random numbers with higher randomness but which requires six cycles, the other providing its result within one cycle but with less randomness. An ASIP (Application Specific Instruction set Processor) was designed to implement the proposed pseudorandom generator instruction sets. We designed a processor based on the MIPS architecture,, by using LISA, and have run statistical tests passing the sequence of the Diehard test suite. The HDL models of the processor were generated using CoWare's Processor Designer and synthesized into the Dong-bu 0.18um CMOS cell library using the Synopsys Design Compiler. With the proposed pseudorandom generator module, random number generation performance was 239% faster than software model, but the area increased only 2.0% of the proposed ASIP.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.