• Title/Summary/Keyword: 곱셈 알고리즘

Search Result 330, Processing Time 0.028 seconds

Simple Power Analysis against RSA Based on Frequency Components (주파수 분석 기반 RSA 단순 전력 분석)

  • Jung, Ji-hyuk;Yoon, Ji-Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • This paper proposes to automate the process of predicting crypto-operations from the power signal generated in RSA decoding process by frequency analysis and K-means algorithm. RSA decoding process is divided into square and multiply operation, and if we can predict the type of operations over time, we will know the RSA key value. After converting the power signal generated in the process of decoding into two-dimensional frequency signal, this paper used K-means algorithm to classify the frequency vector according to the type of operation. these classified frequency vector were used to predict the types of operations.

Functionality-based Processing-In-Memory Accelerator for Deep Neural Networks (딥뉴럴네트워크를 위한 기능성 기반의 핌 가속기)

  • Kim, Min-Jae;Kim, Shin-Dug
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.8-11
    • /
    • 2020
  • 4 차 산업혁명 시대의 도래와 함께 AI, ICT 기술의 융합이 진행됨에 따라, 유저 레벨의 디바이스에서도 AI 서비스의 요청이 실현되었다. 이미지 처리와 관련된 AI 서비스는 피사체 판별, 불량품 검사, 자율주행 등에 이용되고 있으며, 특히 Deep Convolutional Neural Network (DCNN)은 이미지의 특색을 파악하는 데 뛰어난 성능을 보여준다. 하지만, 이미지의 크기가 커지고, 신경망이 깊어짐에 따라 연산 처리에 있어 낮은 데이터 지역성과 빈번한 메모리 참조를 야기했다. 이에 따라, 기존의 계층적 시스템 구조는 DCNN 을 scalable 하고 빠르게 처리하는 데 한계를 보인다. 본 연구에서는 DCNN 의 scalable 하고 빠른 처리를 위해 3 차원 메모리 구조의 Processing-In-Memory (PIM) 가속기를 제안한다. 이를 위해 기존 3 차원 메모리인 Hybrid Memory Cube (HMC)에 하드웨어 및 소프트웨어 모듈을 추가로 구성하였다. 구체적으로, Processing Element (PE)간 데이터를 공유할 수 있는 공유 캐시 및 소프트웨어 스택, 파이프라인화된 곱셈기 및 듀얼 프리페치 버퍼를 구성하였다. 이를 유명 DCNN 알고리즘 LeNet, AlexNet, ZFNet, VGGNet, GoogleNet, RestNet 에 대해 성능 평가를 진행한 결과 기존 HMC 대비 40.3%의 속도 향상을 29.4%의 대역폭 향상을 보였다.

A Fast IFFT Algorithm for IMDCT of AAC Decoder (AAC 디코더의 IMDCT를 위한 고속 IFFT 알고리즘)

  • Chi, Hua-Jun;Kim, Tae-Hoon;Park, Ju-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.214-219
    • /
    • 2007
  • This paper proposes a new IFFT(Inverse Fast Fourier Transform) algorithm, which is proper for IMDCT(Inverse Modified Discrete Cosine Transform) of MPEG-2 AAC(Advanced Audio Coding) decoder. The $2^n$(N-point) type IMDCT is the most powerful among many IMDCT algorithms, however it includes IFFT that requires many calculation cycles. The IFFT used in $2^n$(N-point) type IMDCT employ the bit-reverse data arrangement of inputs and N/4-point complex IFFT to reduce the calculation cycles. We devised a new data arrangement method of IFFT input and $N/4^{n+1}$-type IFFT and thus we can reduce multiplication cycles, addition cycles, and ROM size.

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

  • Jung, Hongkyun;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1203-1212
    • /
    • 2013
  • In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.

IEEE-754 Floating-Point Divider for Embedded Processors (내장형 프로세서를 위한 IEEE-754 고성능 부동소수점 나눗셈기의 설계)

  • Jeong, Jae-Won;Hong, In-Pyo;Jeong, Woo-Kyong;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.7
    • /
    • pp.66-73
    • /
    • 2002
  • As floating-point operations become widely used in various applications such as computer graphics and high-definition DSP, the needs for fast division become increased. However, conventional floating-point dividers occupy a large hardware area, and bring bottle-becks to the entire floating-point operations. In this paper, a high-performance and small-area floating-point divider, which is suitable for embedded processors, is designed using he series expansion algorithm. The algorithm is selected to utilize two MAC(Multiply-ACcumulate) units for quadratic convergence to the correct quotient. The two MAC units for SIMD-DSP features are shared and the additional area for the division only is very small. The proposed divider supports all rounding modes defined by IEEE 754 standard, and error estimations are performed for appropriate precision.

Integer Inverse Transform Structure Based on Matrix for VP9 Decoder (VP9 디코더에 대한 행렬 기반의 정수형 역변환 구조)

  • Lee, Tea-Hee;Hwang, Tae-Ho;Kim, Byung-Soo;Kim, Dong-Sun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.106-114
    • /
    • 2016
  • In this paper, we propose an efficient integer inverse transform structure for vp9 decoder. The proposed structure is a hardware structure which is easy to control and requires less hardware resources, and shares algorithms for realizing entire DCT(Discrete Cosine Transform), ADST(Asymmetric Discrete Sine Transform) and WHT(Walsh-Hadamard Transform) in vp9. The integer inverse transform for vp9 google model has a fast structure, named butterfly structure. The integer inverse transform for google C model, unlike universal fast structure, takes a constant rounding shift operator on each stage and includes an asymmetrical sine transform structure. Thus, the proposed structure approximates matrix coefficient values for all transform mode and is used to matrix operation method. With the proposed structure, shared operations for all inverse transform algorithm modes can be possible with reduced number of multipliers compared to the butterfly structure, which in turn manages the hardware resources more efficiently.

Twiddle Factor Index Generate Method for Memory Reduction in R2SDF FFT (R2SDF FFT의 메모리 감소를 위한 회전인자 인덱스 생성방법)

  • Yang, Seung-Won;Kim, Yong-Eun;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.5
    • /
    • pp.32-38
    • /
    • 2009
  • FTT(Fast Fourier Transform) processor is widely used in OFDM(Orthogonal Frequency Division Multiplesing) system. Because of the increased requirement of mobility and bandwidth in the OFDM system, they need large point FTT processor. Since the size of memory which stores the twiddle factor coefficients are proportional to the N of FFT size, we propose a new method by which we can reduce the size of the coefficient memory. In the proposed method, we exploit a counter and unsigned multiplier to generate the twiddle factor indices. To verify the proposed algorithm, we design TFCGs(Twiddle Factor Coefficient Generator) for 1024pint FFTs with R2SDF(Radix-2 Single-Path Delay Feedback), $R2^3SDF,\;R2^3SDF,\;R2^4SDF$ architectures. The size of ROM is reduced to 1/8N. In the case of $R2^4SDF$ architecture, the area and the power are reduced by 57.9%, 57.5% respectively.

A New Flash A/D Converter Adopting Double Base Number System (2개의 밑수를 이용한 Flash A/D 변환기)

  • Kim, Jong-Soo;Kim, Man-Ho;Jang, Eun-Hwa
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.9 no.1
    • /
    • pp.54-61
    • /
    • 2008
  • This paper presents a new TIQ based CMOS flash 6-bit ADC to process digital signal in real time. In order to improve the conversion speed of ADC by designing new logic or layout of ADC circuits, a new design method is proposed in encoding logic circuits. The proposed encoding circuits convert analog input into digitally encoded double base number system(DBNS), which uses two bases unlike the normal binary representation scheme. The DBNS adopts binary and ternary radix to enhance digital arithmetic processing capability. In the DBNS, the addition and multiplication can be processed with just shift operations only. Finding near canonical representation is the most important work in general DBNS. But the main disadvantage of DBNS representation in ADC is the fan-in problem. Thus, an equal distribution algorithm is developed to solve the fan-in problem after assignment the prime numbers first. The conversion speed of simulation result was 1.6 GSPS, at 1.8V power with the Magna $0.18{\mu}m$ CMOS process, and the maximum power consumption was 38.71mW.

  • PDF

VLSI Architecture for High Speed Implementation of Elliptic Curve Cryptographic Systems (타원곡선 암호 시스템의 고속 구현을 위한 VLSI 구조)

  • Kim, Chang-Hoon
    • The KIPS Transactions:PartC
    • /
    • v.15C no.2
    • /
    • pp.133-140
    • /
    • 2008
  • In this paper, we propose a high performance elliptic curve cryptographic processor over $GF(2^{163})$. The proposed architecture is based on a modified Lopez-Dahab elliptic curve point multiplication algorithm and uses Gaussian normal basis for $GF(2^{163})$ field arithmetic. To achieve a high throughput rates, we design two new word-level arithmetic units over $GF(2^{163})$ and derive a parallelized elliptic curve point doubling and point addition algorithm with uniform addressing based on the Lopez-Dahab method. We implement our design using Xilinx XC4VLX80 FPGA device which uses 24,263 slices and has a maximum frequency of 143MHz. Our design is roughly 4.8 times faster with 2 times increased hardware complexity compared with the previous hardware implementation proposed by Shu. et. al. Therefore, the proposed elliptic curve cryptographic processor is well suited to elliptic curve cryptosystems requiring high throughput rates such as network processors and web servers.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.