• Title/Summary/Keyword: Instruction-set Architecture

Search Result 87, Processing Time 0.02 seconds

On-Demand Remote Software Code Execution Unit Using On-Chip Flash Memory Cloudification for IoT Environment Acceleration

  • Lee, Dongkyu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.191-202
    • /
    • 2021
  • In an Internet of Things (IoT)-configured system, each device executes on-chip software. Recent IoT devices require fast execution time of complex services, such as analyzing a large amount of data, while maintaining low-power computation. As service complexity increases, the service requires high-performance computing and more space for embedded space. However, the low performance of IoT edge devices and their small memory size can hinder the complex and diverse operations of IoT services. In this paper, we propose a remote on-demand software code execution unit using the cloudification of on-chip code memory to accelerate the program execution of an IoT edge device with a low-performance processor. We propose a simulation approach to distribute remote code executed on the server side and on the edge side according to the program's computational and communicational needs. Our on-demand remote code execution unit simulation platform, which includes an instruction set simulator based on 16-bit ARM Thumb instruction set architecture, successfully emulates the architectural behavior of on-chip flash memory, enabling embedded devices to accelerate and execute software using remote execution code in the IoT environment.

An Efficient Architecture Exploration Method for Optimal ASIP Design (Application에 최적의 ASIP 설계를 위한 효율적인 Architecture Exploration 방법)

  • Lee, Sung-Rae;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.913-921
    • /
    • 2007
  • Retargetable compiler which generates executable code for a target processor and performance profiler are required to design a processor optimized for a specific application. This paper presents an architecture exploration methodology based on ADL (Architecture Description Language). We synthesized instruction set and optimized processor structure using information extracted from application program. The information of operation sequences executed frequently and register usage are used for processor optimization. Architecture exploration has been performed for JPEG encoder to show the effectiveness of the system. The ASIP designed using the proposed method shows 1.97 times better performance.

An Efficient Bit Stream Instruction-set for Network Packet Processing Applications (네트워크 패킷 처리를 위한 효율적인 비트 스트림 명령어 세트)

  • Yoon, Yeo-Phil;Lee, Yong-Surk;Lee, Jung-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.10
    • /
    • pp.53-58
    • /
    • 2008
  • This paper proposes a new set of instructions to improve the packet processing capacity of a network processor. The proposed set of instructions is able to achieve more efficient packet processing by accelerating integration of packet headers. Furthermore, a hardware configuration dedicated to processing overlay instructions was designed to reduce additional hardware cost. For this purpose, the basic architecture for the network processor was designed using LISA and the overlay block was optimized based on the barrel shifter. The block was synthesized to compare the area and the operation delay, and allocated to a C-level macro function using the compiler known function (CKF). The improvement in performance was confirmed by comparing the execution cycle and the execution time of an application program. Experiments were conducted using the processor designer and the compiler designer from Coware. The result of synthesis with the TSMC ($0.25{\mu}m$) from Synopsys indicated a reduction in operation delay by 20.7% and an improvement in performance of 30.8% with the proposed set of instructions for the entire execution cycle.

Performance Analysis of Multicore Processor Architectures Based On Cache Size Effects (캐쉬 용량 효과에 대한 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.6
    • /
    • pp.175-180
    • /
    • 2012
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multicore architecture has been prevalent recently. The configuration and the size of instruction and data caches greatly gives effect on the performance of multicore processors. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 2-core to 16-core architectures with different sizes of caches extensively. As a result, the 2-way set associative instruction and data cache with the size of 64KB brought the best cost-effective performance.

Retargetable Instruction-Set Simulator for Energy Consumption Monitoring (에너지 소비 모니터링을 위한 재목적 인스트럭션-셋 시뮬레이터)

  • Ko, Kwang-Man
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.3
    • /
    • pp.462-470
    • /
    • 2011
  • Retargetability is typically achieved by providing target machine information, ADL, as input. The ADL are used to specify processor and memory architectures and generate software toolkit including compiler, simulator, etc. Simulator are critical components of the exploration and software design toolkit for the system designer. They can be used to perform diverse tasks such as verifying the functionality and/or timing behavior of the system, and generating quantitative measurements(e.g., power energy consumption) which can be used to aid the design process. In this paper, we generate the energy consumption estimation simulator through ADL. For this goal, firstly, we describes the energy consumption estimation and monitoring informations on the ADL based on EXPRESSION. Secondly, we generate the energy estimation and monitoring simulation library and then constructs the simulator, RenergySim. Lastly, we represent the energy estimations results for MIPS R4000 ADL description. From this subjects, we contribute to the efficient architecture developments and prompt SDK generation through programmable experiments in the field of mobile software development.

Pair Register Allocation Algorithm for 16-bit Instruction Set Architecture (ISA) Processor (16비트 명령어 기반 프로세서를 위한 페어 레지스터 할당 알고리즘)

  • Lee, Ho-Kyoon;Kim, Seon-Wook;Han, Young-Sun
    • The KIPS Transactions:PartA
    • /
    • v.18A no.6
    • /
    • pp.265-270
    • /
    • 2011
  • Even though 32-bit ISA based microprocessors are widely used more and more, 16-bit ISA based processors are still being frequently employed for embedded systems. Intel 8086, 80286, Motorola 68000, and ADChips AE32000 are the representatives of the 16-bit ISA based processors. However, due to less expressiveness of the 16-bit ISA from its narrow bit width, we need to execute more 16-bit instructions for the same implementation compared to 32-bit instructions. Because the number of executed instructions is a very important factor in performance, we have to resolve the problem by improving the expressiveness of the 16-bit ISA. In this paper, we propose a new pair register allocation algorithm to enhance an original graph-coloring based register allocation algorithm. Also, we explain about both the performance result and further research directions.

A New Asynchronous Pipeline Architecture for CISC type Embedded Micro-Controller, A8051 (CISC 임베디드 컨트롤러를 위한 새로운 비동기 파이프라인 아키텍쳐, A8051)

  • 이제훈;조경록
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.85-94
    • /
    • 2003
  • The asynchronous design methods proved to have the higher performance in power consumption and execution speed than synchronous ones because it just needs to activate the required module without feeding clock in the system. Despite the advantage of CISC machine providing the variable addressing modes and instructions, its execution scheme is hardly suited for a synchronous Pipeline architecture and incurs a lot of overhead. This paper proposes a novel asynchronous pipeline architecture, A80sl, whose instruction set is fully compatible with that of Intel 80C51, an embedded micro controller. We classify the instructions into the group keeping the same execution scheme for the asynchronous pipeline and optimize it eliminating the bubble stage that comes from the overhead of the multi-cycle execution. The new methodologies for branch and various instruction lengths are suggested to minimize the number of states required for instructions execution and to increase its parallelism. The proposed A80C51 architecture is synthesized with 0.35${\mu}{\textrm}{m}$ CMOS standard cell library. The simulation results show higher speed than that of Intel 80C51 with 36 MHz and other asynchronous counterparts by 24 times.

A 16 bit FPGA Microprocessor for Embedded Applications (실장제어 16 비트 FPGA 마이크로프로세서)

  • 차영호;조경연;최혁환
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.7
    • /
    • pp.1332-1339
    • /
    • 2001
  • SoC(System on Chip) technology is widely used in the field of embedded systems by providing high flexibility for a specific application domain. An important aspect of development any new embedded system is verification which usually requires lengthy software and hardware co-design. To reduce development cost of design effort, the instruction set of microprocessor must be suitable for a high level language compiler. And FPGA prototype system could be derived and tested for design verification. In this paper, we propose a 16 bit FPGA microprocessor, which is tentatively-named EISC16, based on an EISC(Extendable Instruction Set Computer) architecture for embedded applications. The proposed EISC16 has a 16 bit fixed length instruction set which has the short length offset and small immediate operand. A 16 bit offset and immediate operand could be extended using by an extension register and an extension flag. We developed a cross C/C++ compiler and development software of the EISC16 by porting GNU on an IBM-PC and SUN workstation and compared the object code size created after compiling a C/C. standard library, concluding that EISC16 exhibits a higher code density than existing 16 microprocessors. The proposed EISC16 requires approximately 6,000 gates when designed and synthesized with RTL level VHDL at Xilinix's Virtex XCV300 FPGA. And we design a test board which consists of EISC16 ROM, RAM, LED/LCD panel, periodic timer, input key pad and RS-232C controller. 11 works normally at 7MHz Clock.

  • PDF

Performance Improvement of the programmable processor designed for H.264 on-chip encoder (H.264 on-chip encoder를 위한 programmable processor 성능 향상)

  • Lee, Jinyong;Kim, Kyungwon;Heo, Ingoo;Park, Sanghyun;Kim, Yongjoo;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.19-20
    • /
    • 2009
  • H.264 부호기의 on-chip 상의 구현방법으로는 성능에 중점을 둔 ASIC (application specific integrated circuit) 기반의 접근 방식과 ASIC 보다 성능은 떨어지나 일반성과 유연성에 중점을 둔 ASIP (application specific instruction set architecture) 기반의 설계 방식이 연구되어 왔다. 우리는 영상 압축 응용 범위 내에서는 일반성 및 유연성을 잃지 않으면서도 기존에 문제시 되던 ASIP의 성능은 대폭 개선할 수 있는 ISA와 micro architecture를 제안하고 구현한 바 있다. 본 논문의 핵심적인 기여는 이 ASIP의 추가적인 성능 개선이다.

Design of an Asynchronous Instruction Cache based on a Mixed Delay Model (혼합 지연 모델에 기반한 비동기 명령어 캐시 설계)

  • Jeon, Kwang-Bae;Kim, Seok-Man;Lee, Je-Hoon;Oh, Myeong-Hoon;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.64-71
    • /
    • 2010
  • Recently, to achieve high performance of the processor, the cache is splits physically into two parts, one for instruction and one for data. This paper proposes an architecture of asynchronous instruction cache based on mixed-delay model that are DI(delay-insensitive) model for cache hit and Bundled delay model for cache miss. We synthesized the instruction cache at gate-level and constructed a test platform with 32-bit embedded processor EISC to evaluate performance. The cache communicates with the main memory and CPU using 4-phase hand-shake protocol. It has a 8-KB, 4-way set associative memory that employs Pseudo-LRU replacement algorithm. As the results, the designed cache shows 99% cache hit ratio and reduced latency to 68% tested on the platform with MI bench mark programs.