• Title/Summary/Keyword: Optimized implementation

Search Result 518, Processing Time 0.028 seconds

High-Speed Implementation to CHAM-64/128 Counter Mode with Round Key Pre-Load Technique (라운드 키 선행 로드를 통한 CHAM-64/128 카운터 모드 고속 구현)

  • Kwon, Hyeok-dong;Jang, Kyoung-bae;Park, Jae-hoon;Seo, Hwa-jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1217-1223
    • /
    • 2020
  • The Block cipher CHAM is lightweight block cipher for low-end processors, developed by National Security Research Institute from Korea. The mode of operation is necessity for efficient operation of block cipher, among them, the counter (CTR) mode has good efficiency because it is easy to implement and supporting parallel operation. In this paper, we propose the optimized implementation for block cipher CHAM-CTR. The proposed implementation can be skipped some rounds by pre-computation. Thus it has better calculating speed than existing CHAM. Also, this implementation pre-load some of round keys to registers, before entering round functions. It makes reduced 160cycles loading time for round key load. Finally, proposed implementation achieved higher performance about 6.8%, and 4.5% for fixed-key scenario, and variable-key scenario, respectively.

Optimized Implementation of Lightweight Block Cipher PIPO Using T-Table (T-table을 사용한 경량 블록 암호 PIPO의 최적화 구현)

  • Minsig Choi;Sunyeop Kim;Insung Kim;Hanbeom Shin;Seonggyeom Kim;Seokhie Hong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.391-399
    • /
    • 2023
  • In this paper, we presents for the first time an implementation using T-table for PIPO-64/128, 256 which are lightweight block ciphers. While our proposed implementation requires 16 T-tables, we show that the two types of T-tables are circulant and obtain variants implementations that require a smaller number of T-tables. We then discuss trade-off between the number of required T-tables (code size) and throughput by evaluating the throughput of the variant implementations on an Intel Core i7-9700K processor. The throughput-optimized versions for PIPO-64/128, 256 provide better throughput than TLU(Table-Look-Up) reference implementation by factors of 3.11 and 2.76, respectively, and bit-slice reference implementation by factors of 3.11 and 2.76, respectively.

Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance

  • Choi, Do Young;Oh, Jung Hwan;Kim, Ji Kwang;Lee, Seung Eun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4648-4663
    • /
    • 2020
  • This paper proposes the Lempel-Ziv 4(LZ4) compression accelerator optimized for scale-out servers in data centers. In order to reduce CPU loads caused by compression, we propose an accelerator solution and implement the accelerator on an Field Programmable Gate Array(FPGA) as heterogeneous computing. The LZ4 compression hardware accelerator is a fully pipelined architecture and applies 16 dictionaries to enhance the parallelism for high throughput compressor. Our hardware accelerator is based on the 20-stage pipeline and dictionary architecture, highly customized to LZ4 compression algorithm and parallel hardware implementation. Proposing dictionary architecture allows achieving high throughput by comparing input sequences in multiple dictionaries simultaneously compared to a single dictionary. The experimental results provide the high throughput with intensively optimized in the FPGA. Additionally, we compare our implementation to CPU implementation results of LZ4 to provide insights on FPGA-based data centers. The proposed accelerator achieves the compression throughput of 639MB/s with fine parallelism to be deployed into scale-out servers. This approach enables the low power Intel Atom processor to realize the Hadoop storage along with the compression accelerator.

Optimized DES Core Implementation for Commercial FPGA Cluster System (상용 FPGA 클러스터 시스템 기반의 최적화된 DES 코어 설계)

  • Jung, Eun-Gu;Park, Il-Hwan
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.2
    • /
    • pp.131-138
    • /
    • 2011
  • The previous FPGA cluster systems for a brute force search of DES keyspace have showed cost efficient performance, but the research on optimized implementation of the DES algorithm on a single FPGA has been insufficient. In this paper, the optimized DES implementation for a single FPGA of the commercial FPGA cluster system with 77 Xilinx Virtex5-LX50 FPGAs is proposed. Design space exploration using the number of pipeline stages in a DES core, the number of DES cores and the maximum clock frequency of a DES core is performed which leads to integrating 16 DES cores running at 333MHz. Also low power design is applied to reduce the loss of performance caused by limitation of power supply on each FPGA which results in fitting 8 DES cores running at 333MHz. When the proposed DES implementations would be used in the FPGA cluster system, it is estimated that the DES key would be found at most 2.03 days and 4.06 days respectively.

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor (ARM/NEON 프로세서를 활용한 NIST PQC SABER에서 Toom-Cook 알고리즘 최적화 구현 연구)

  • Song, JinGyo;Kim, YoungBeom;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.463-471
    • /
    • 2021
  • Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.

Improving the Implementation Complexity of the Latency-Optimized Fair Queuing Algorithm (최적 레이턴시 기반 공정 큐잉 알고리즘의 구현 복잡도 개선)

  • Kim, Tae-Joon;Suh, Bong-Sue
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.6B
    • /
    • pp.405-413
    • /
    • 2012
  • WFQ(Weighted Fair Queuing) is the most popular fair queuing algorithm to guarantee the Quality-of-Service(QoS), but it has the inherent drawback of a poor resource utilization, particularly under the low rate traffic requiring a tight delay bound. It was recently identified that the poor utilization is mainly due to non-optimized latency of a traffic flow and then LOFQ(Latency-Optimized Fair Queuing) to overcome the drawback was introduced. The LOFQ algorithm, however, renews their optimal latencies for all flows whenever a new flow arrives, which results in the high implementation complexity of O($N^2$).This paper is to reduce thecomplexity to O(1). The proposed method is first to derive the optimal latency index function from the statistical QoS characteristics of the offered load, and then to simply calculate the optimal latency index of the arriving flow using the function.

Digital Implementation of $H_\infty$ Optimal Controller ($H_\infty$ 최적제어기의 이산화 구현)

  • 김광우;오도창;박홍배
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1993.10a
    • /
    • pp.471-476
    • /
    • 1993
  • In this paper we proposed the digital implementation of an $H^{\infty}$-optimal controller using lifting technique and $H^{\infty}$-control theory. The discrete controller is obtained through iterative adjustment of sampling time and weighting function, which can ber performed by computing the L$_{2}$-induced input to output norm of the sampled-data system with bandlimited exogenous input. The resulting sampled-data bandlimited exogenous input. The resulting sampled-data system is stable and the performance including inter-sampling behaviour of the hybrid system can be also optimized.d.

  • PDF

Design and Implementation of AI Recommendation Platform for Commercial Services

  • Jong-Eon Lee
    • International journal of advanced smart convergence
    • /
    • v.12 no.4
    • /
    • pp.202-207
    • /
    • 2023
  • In this paper, we discuss the design and implementation of a recommendation platform actually built in the field. We survey deep learning-based recommendation models that are effective in reflecting individual user characteristics. The recently proposed RNN-based sequential recommendation models reflect individual user characteristics well. The recommendation platform we proposed has an architecture that can collect, store, and process big data from a company's commercial services. Our recommendation platform provides service providers with intuitive tools to evaluate and apply timely optimized recommendation models. In the model evaluation we performed, RNN-based sequential recommendation models showed high scores.

Design and Implementation of Optimized Route Search Technique based on User Experience Using Open APIs (지도 오픈 API를 활용한 사용자 경험 기반 최적화 이동 경로 탐색 기법의 설계와 구현)

  • Sagong, Woon
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.5
    • /
    • pp.682-690
    • /
    • 2015
  • Among location-based systems, a route search service is very highly utilized as a representative technique, but it provides relatively low accuracy when we find a route path on foot in our real environment. In this paper, we design and implement an optimized route search technique based on user experience utilizing open APIs as location-based services. Finally, we develop an Android-based application to provide this feature. In our experiment, we found that our technique enhanced performance by about 14-36% compared to previous solutions, such as route path searches using map APIs. In addition, the performance of our technique can be further enhanced, as the number of users who find such optimized route path is increasing.

A Study on the Optimized Representation for Data and Control Flow Information (자료 및 제어 흐름 정보의 최적화 표현에 관한 연구)

  • 정성옥;고광만;이성주
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.3
    • /
    • pp.681-687
    • /
    • 2000
  • Ideograph is a truly unifies data and procedural dependencies, Ideograph can be used to assist various program optimization, such as common expression elimination, code motion, constant folding etc. In this paper, we design and implementation of the optimized abstract syntax tree using Ideograph. Ideograph has control flow information and data flow information for source program. So we use a Ideograph in order to produce a optimized Ideograph with control flow information and data flow information.

  • PDF