• Title/Summary/Keyword: ARM-NEON

Search Result 12, Processing Time 0.034 seconds

High Performance Implementation of SGCM on High-End IoT Devices

  • Seo, Hwajeong
    • Journal of information and communication convergence engineering
    • /
    • v.15 no.4
    • /
    • pp.212-216
    • /
    • 2017
  • In this paper, we introduce novel techniques to improve the high performance of AE functions on modern high-end IoT platforms (ARM-NEON), which support SIMD and cryptography instruction sets. For the Sophie Germain Counter Mode of operation (SGCM), counter modes of encryption and prime field multiplication are required. We chose the Montgomery multiplication for modular multiplication. We perform Montgomery multiplication in a parallel way by exploiting both the ARM and NEON instruction sets. Specifically, the NEON instruction performed 128-bit integer multiplication and the ARM instruction performed Montgomery reduction, simultaneously. This approach hides the latency for ARM in the NEON instruction set. For a high-speed counter mode of encryptions for both AE functions, we introduced two-level computations. When the tasks were large volume, we switched to the NEON instruction to execute the encryption operations. Otherwise, we performed the encryptions on the ARM module.

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor (ARM/NEON 프로세서를 활용한 NIST PQC SABER에서 Toom-Cook 알고리즘 최적화 구현 연구)

  • Song, JinGyo;Kim, YoungBeom;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.463-471
    • /
    • 2021
  • Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.

Implementation and Analysis of Multi-Precision Multiplication for Public Key Cryptography Based on Android Platform (안드로이드 기반 공개키 암호를 위한 곱셈기 구현 및 분석)

  • Seo, Hwa-Jeong;Kim, Ho-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37C no.10
    • /
    • pp.940-948
    • /
    • 2012
  • Android program is developed with JAVA SDK and executed over virtual machine. For this reason, programming is easier than traditional C language but performance of operating speed decreases. To enhance the performance, NDK development tool, which provides C language, assembly language environment, was proposed. Furthermore, with NEON function provided by ARM, we can utilize the vector operation and enhance performance. In the paper, we explore effectiveness of NDK and then propose advanced multiplication structure with NEON function.

Acceleration Method of Inter Prediction using Advanced SIMD (Advanced SIMD를 이용한 화면 간 예측 고속화방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.382-388
    • /
    • 2012
  • An H.264/AVC fast motion estimation methodology is presented in this paper. Advanced SIMD based NEON which is one of the parallel processing methods is supported under the ARM Cortex-A9 dual-core platform. NEON is applied to a full search technique with one of the various motion estimation methods and SAD operation count of each macroblock is reduced to 1/4. Pixel values of the corresponding macroblock are assigned to eight 16-bit NEON registers and Intrinsic function in NEON architecture carried out 128 bits arithmetic operations at the same time. In this way, the exact motion vector with the minimum SAD value among the calculated SAD values can be designated. Experimental results show that performance gets improved 30% above average in accordance with the size of image and macroblock.

A Speed-up Method of HOG Pedestrian Detector in Advanced SIMD Architecture (Advanced SIMD 아키텍처에서의 HOG 보행자 검출기 고속화 방법)

  • Kwon, Ki-Pyo;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.18 no.1
    • /
    • pp.106-113
    • /
    • 2014
  • A pedestrian detector can be applied for various purposes such as monitoring or counting the number of people in some place, or detecting the people plunging in the driveway. There was a lot of related research. But, the detection speed is slow in embedded system because of the limited computing power. An algorithm for fast pedestrian detector using HOG in ARM SIMD architecture is presented in this paper. There is a way to quickly remove the background of image and to improve the detection speed using NEON parallel technique. When we tested with INRIA Person Dataset, the proposed pedestrian detector improves the speed by 3.01 times than previous one.

Calculating an inverse of a $4{\times}4$ matrix using Neon (Neon 을 사용한 $4{\times}4$ 행렬의 역행렬 연산)

  • Oh, Yu-Yeon;Lee, Chang-Gun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.344-346
    • /
    • 2012
  • 스마트폰에서의 3D 게임/애플리케이션에 대한 사용자의 요구는 나날이 증가하고 있다. 3D 게임/애플리케이션은 내부적으로 $4{\times}4$ 행렬을 가지고 여러 가지 좌표 변환을 수행하기 때문에, 보다 빠른 3D 그래픽스 처리를 위해서는 $4{\times}4$ 행렬 연산의 최적화가 필수적이다. $4{\times}4$ 행렬 연산중에 역행렬 연산에 대해 살펴보고, ARM 프로세서에서 지원하는 Neon 연산자를 이용해서 $4{\times}4$ 행렬의 역행렬 연산을 개선할 수 있다.

Fast pedestrian detector using HOG in ARM architecture (HOG를 이용한 ARM 아키텍처에서의 고속 보행자 검출기)

  • Kwon, Ki-Pyo;Lee, Jae-Heung;Kang, Byung-Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.161-164
    • /
    • 2013
  • 보행자 검출기는 보안이 필요한 곳에서 모니터링을 하거나 특정 장소를 드나드는 사람의 수를 셀 때, 운전 중 차도에 뛰어드는 사람을 감지할 때 등 상황에 따라 여러 목적으로 응용될 수 있다. 이에 따른 연구는 많이 진행되어 왔지만, 임베디드 시스템에서는 제한된 컴퓨팅 능력으로 인해 검출 속도가 느리다는 문제가 있다. 본 논문에서는 입력 영상에서 배경 부분을 빠르게 제거하여 검출 속도를 향상하는 방법과 ARM 아키텍처에서 NEON 병렬화 기법을 이용하여 검출 속도를 향상하는 방법을 제시한다. 제시한 방법으로 구현한 검출기는 기존보다 201.1% 향상된 속도를 나타냈다.

Improving the speed of deep neural networks using the multi-core and single instruction multiple data technology (다중 코어 및 single instruction multiple data 기술을 이용한 심층 신경망 속도 향상)

  • Chung, Ik Joo;Kim, Seung Hi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.425-435
    • /
    • 2017
  • In this paper, we propose optimization methods for speeding the feedforward network of deep neural networks using NEON SIMD (Single Instruction Multiple Data) parallel instructions and multi-core parallelization on the multi-core ARM processor. As the result of the optimization using SIMD parallel instructions, we present the amount of speed improvement and arithmetic precision stage by stage. Through the optimization using SIMD parallel instructions on the single core, we obtain $2.6{\times}$ speedup over the baseline implementation using C compiler. Furthermore, by parallelizing the single core implementation on the multi-core, we obtain $5.7{\times}{\sim}7.7{\times}$ speedup. The results we obtain show the possibility for applying the arithmetic-intensive deep neural network technology to applications on mobile devices.

Changes in Quality of Seasoned and Smoked Squid During Processing (조미훈연 오징어의 가공중 품질변화)

  • RYU Hong-Soo;MUN Sook-Im;LEE Kang-Hoo
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.25 no.5
    • /
    • pp.406-412
    • /
    • 1992
  • Changes in proximate composition and protein quality were determined to find out appropriate processing conditions of the seasoned and smoked squid(Neon flying squid, Ommastrephes bartrannii). Moisture and crude protein contents were severely reduced (p<0.05), while increasing of fat and ash contents were not apparent. Seasoning and smoking contributed iii enhancing TBA value. Trypsin inhibitor(Tl) content was not increased severely after those processing steps. TI content checked in the all steps of squid processing was not correlated with the TBA value of squid in the same processing step. An improved digestibility and protein efficiency ratio(PER) were observed in the all products except with steak(mechanically soften product) in vitro enzymatic digestibilities of both raw Neon flying squid meats(mantle and arm) were significantly inferior(p<0.05) to other squid species.

  • PDF

Acceleration of Radial Gradient Paint Processor for Mobile Device (모바일 기기에서의 방사형 그라디언트 페인트 가속)

  • Kim, Jin-Woo;Park, Jin-Hong;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.530-533
    • /
    • 2011
  • 방사형 그라디언트 페인트(radial gradient paint)는 벡터 그래픽스(vector graphics)에서 적은 정보로 다양한 효과를 적용시킬 수 있는 방법이다. 기본적으로 이 방법은 곱하기, 나누기, 제곱근 등의 복잡한 연산이 필요하기 때문에 모바일 같은 저성능 환경에 적합하지 않았다. 하지만 최근 모바일 기기들은 SIMD 연산 지원 및 고성능의 GPU 탑재 등으로 성능이 향상됨에 따라 이러한 문제를 해결할 수 있게 되었다. 본 논문은 ARM의 SIMD연산인 NEON을 이용하여 최대 2.6배의 성능을 가속시켰으며 GPU의 쉐이더를 이용하여 4.9배의 성능을 가속하였다.