• Title/Summary/Keyword: Embedded Processors

Search Result 162, Processing Time 0.024 seconds

Optimization of ARIA Block-Cipher Algorithm for Embedded Systems with 16-bits Processors

  • Lee, Wan Yeon;Choi, Yun-Seok
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.42-52
    • /
    • 2016
  • In this paper, we propose the 16-bits optimization design of the ARIA block-cipher algorithm for embedded systems with 16-bits processors. The proposed design adopts 16-bits XOR operations and rotated shift operations as many as possible. Also, the proposed design extends 8-bits array variables into 16-bits array variables for faster chained matrix multiplication. In evaluation experiments, our design is compared to the previous 32-bits optimized design and 8-bits optimized design. Our 16-bits optimized design yields about 20% faster execution speed and about 28% smaller footprint than 32-bits optimized code. Also, our design yields about 91% faster execution speed with larger footprint than 8-bits optimized code.

Dynamic Object Detection Architecture for LiDAR Embedded Processors (라이다 임베디드 프로세서를 위한 동적 객체인식 아키텍처 구현)

  • Jung, Minwoo;Lee, Sanghoon;Kim, Dae-Young
    • Journal of Platform Technology
    • /
    • v.8 no.4
    • /
    • pp.11-19
    • /
    • 2020
  • In an autonomous driving environment, dynamic recognition of objects is essential as the situation changes in real time. In addition, as the number of sensors and control modules built into an autonomous vehicle increases, the amount of data the central control unit has to process also rapidly increases. By minimizing the output data from the sensor, the load on the central control unit can be reduced. This study proposes a dynamic object recognition algorithm solely using the embedded processor on a LiDAR sensor. While there are open source algorithms to process the point cloud output from LiDAR sensors, most require a separate high-performance processor. Since the embedded processors installed in LiDAR sensors often have resource constraints, it is essential to optimize the algorithm for efficiency. In this study, an embedded processor based object recognition algorithm was developed for autonomous vehicles, and the correlation between the size of the point clouds and processing time was analyzed. The proposed object recognition algorithm evaluated that the processing time directly increased with the size of the point cloud, with the processor stalling at a specific point if the point cloud size is beyond the threshold

  • PDF

Compact implementations of Curve Ed448 on low-end IoT platforms

  • Seo, Hwajeong
    • ETRI Journal
    • /
    • v.41 no.6
    • /
    • pp.863-872
    • /
    • 2019
  • Elliptic curve cryptography is a relatively lightweight public-key cryptography method for key generation and digital signature verification. Some lightweight curves (eg, Curve25519 and Curve Ed448) have been adopted by upcoming Transport Layer Security 1.3 (TLS 1.3) to replace the standardized NIST curves. However, the efficient implementation of Curve Ed448 on Internet of Things (IoT) devices remains underexplored. This study is focused on the optimization of the Curve Ed448 implementation on low-end IoT processors (ie, 8-bit AVR and 16-bit MSP processors). In particular, the three-level and two-level subtractive Karatsuba algorithms are adopted for multi-precision multiplication on AVR and MSP processors, respectively, and two-level Karatsuba routines are employed for multi-precision squaring. For modular reduction and finite field inversion, fast reduction and Fermat-based inversion operations are used to mitigate side-channel vulnerabilities. The scalar multiplication operation using the Montgomery ladder algorithm requires only 103 and 73 M clock cycles on AVR and MSP processors.

An Optimal Implementation of Object Tracking Algorithm for DaVinci Processor-based Smart Camera (다빈치 프로세서 기반 스마트 카메라에서의 객체 추적 알고리즘의 최적 구현)

  • Lee, Byung-Eun;Nguyen, Thanh Binh;Chung, Sun-Tae
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.17-22
    • /
    • 2009
  • DaVinci processors are popular media processors for implementing embedded multimedia applications. They support dual core architecture: ARM9 core for video I/O handling as well as system management and peripheral handling, and DSP C64+ core for effective digital signal processing. In this paper, we propose our efforts for optimal implementation of object tracking algorithm in DaVinci-based smart camera which is being designed and implemented by our laboratory. The smart camera in this paper is supposed to support object detection, object tracking, object classification and detection of intrusion into surveillance regions and sending the detection event to remote clients using IP protocol. Object tracking algorithm is computationally expensive since it needs to process several procedures such as foreground mask extraction, foreground mask correction, connected component labeling, blob region calculation, object prediction, and etc. which require large amount of computation times. Thus, if it is not implemented optimally in Davinci-based processors, one cannot expect real-time performance of the smart camera.

  • PDF

An Energy Efficient and High Performance Data Cache Structure Utilizing Tag History of Cache Addresses (캐시 주소의 태그 이력을 활용한 에너지 효율적 고성능 데이터 캐시 구조)

  • Moon, Hyun-Ju;Jee, Sung-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.14A no.1 s.105
    • /
    • pp.55-62
    • /
    • 2007
  • Uptime of embedded processors for mobile devices are dependent on battery consumption. Especially the large portion of power consumption is known to be due to cache management in embedded processors. This paper proposes an energy efficient data cache structure for high performance embedded processors. High performance prefetching data cache issues prefetching instructions before issuing demand-fetch instructions based on reference predictions. These prefetching instruction bring reduction on memory delay by improving cache hit ratio, but on the other hand those increase energy consumption in proportion to the number of prefetching instructions. In this paper, we adopt tag history table on prefetching data cache for reducing energy consumption by minimizing parallel tag comparison. Experimental results show the proposed data cache improves performance on energy consumption as well as memory delay.

Implementation of A Low-Power Embedded System via Scratch-pad Memory Compression (스크래치 패드 메모리의 압축을 통한 저전력 임베디드 시스템의 구현)

  • Suh, Hyo-Joong
    • The KIPS Transactions:PartA
    • /
    • v.15A no.5
    • /
    • pp.269-274
    • /
    • 2008
  • Recently, lots of embedded processors which can run streaming multimedia with high resolution display are introduced. Among the applications running on these embedded processors, real-time audio streaming is one of the applications that suffer from the lack of energy and memory space. In this paper, we propose a novel data compression method on scratch-pad memory, which saves both useful space on the scratch-pad memory and energy. We have implemented the data compression scheme on the GDM1202 real-time audio streaming processor, and the performance results show that we obtained 13.3% energy saving while maintaining comparable application performance to that of the non-compression case.

A Design and Implementation of 32-bit RISC-V RV32IM Pipelined Processor in Embedded Systems (임베디드 환경에서의 32-bit RISC-V RV32IM 파이프라인 프로세서 설계 및 구현)

  • Subin Park;Yongwoo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.81-86
    • /
    • 2023
  • Recently, demand for embedded systems requiring low power and high specifications has been increasing, and RISC-V processors are being widely applied. RISC-V, a RISC-based open instruction set architecture (ISA), has been developed and researched by UC Berkeley and other researchers since 2010. RV32I ISA is sufficient to support integer operations such as addition and subtraction instructions, but M-extension should be defined for multiplication and division instructions. This paper proposes an RV32I, RV32IM processor, and indicates benchmark performance scores compared to an existing processor. Additionally, A non-stalling method was proposed to support a 2-stage pipelined DSP multiplier to the 5-stage pipelined RV32IM processor. Proposed RV32I and RV32IM processors satisfied a maximum operating frequency of 50 MHz on Artix-7 FPGA. The performance of the proposed processors was verified using benchmark programs from Dhrystone and Coremark. As a result, the Coremark benchmark results of the proposed processor showed that it outperformed the existing RV32IM processor by 23.91%.

  • PDF

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

Design of a cosynthesis system for pipelined application-specific instruction processors (파이프라인을 지원하는 ASIP 합성 시스템의 설계)

  • 현민호;이석근;박창욱;황선영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.3
    • /
    • pp.444-453
    • /
    • 1997
  • This paper presents the prototype design of hardware/software cosynthesis system for pipelined application-specific instruction processors. Taking application programs in VHDL as inputs, the proposed system generates a pipelined instruction-set processor and the instruction sequences running on the generated machine. The design space of datapath and controller is defined by the architectural templates embedded in the system. Generating the intyermediate code adequate for parallelism analysis and extraction, the system converts it into assembly codes. Experimental results show the effectiveness of the proposed system.

  • PDF

Group Mutual Exclusion Algorithm Using RMS in Community Computing Environments (커퓨니티 컴퓨팅 환경에서 자원 관리 서비스를 이용한 그룹 상호 배제 알고리즘)

  • Park, Chang-Woo;Kim, Ki-Young;Jung, Hye-Dong;Kim, Seok-Yoon
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.281-283
    • /
    • 2009
  • Forming Community is important to manage and provide the service in Ubiquitous Environments including embedded tiny computers. Community Computing is that members constitute the community and cooperate. A mutual exclusion problem occurs when many processors try to use one resource and race condition happens. In the expanded concept, a group mutual exclusion problem is that processors in the same group can share the resource but processors in different groups cannot share. As mutual exclusion problems might be in community computing environments, we propose algorithm which improves the execution speed using RMS (resource management service). In this paper describes proposed algorithm and proves its performance by experiments, comparing proposed algorithm with previous method using quorum-based algorithm.

  • PDF