• Title/Summary/Keyword: Parallel computation

Search Result 594, Processing Time 0.024 seconds

Efficient implementation of AES CTR Mode for a Mobile Environment (모바일 환경을 위한 AES CTR Mode의 효율적 구현)

  • Park, Jin-Hyung;Paik, Jung-Ha;Lee, Dong-Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.5
    • /
    • pp.47-58
    • /
    • 2011
  • Recently, there are several technologies for protecting information in the lightweight device, One of them, the AES[1] algorithm and CRT mode, is used for numerous services(e,g, OMA DRM, VoIP, IPTV) as encryption technique for preserving confidentiality. Although it is possible that the AES algorithm CRT mode can parallel process transmitting data, IPTV Set-top Box or Mobile Device that uses these streaming service has limited computation-ability. So optimizing crypto algorithm and enhancing its efficiency for those environment have become an important issue. In this paper, we propose implementation method that can improve efficiency of the AES-CRT Mode by improving algorithm logics. Moreover, we prove the performance of our proposal on the mobile device which has limited capability.

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor (ARM/NEON 프로세서를 활용한 NIST PQC SABER에서 Toom-Cook 알고리즘 최적화 구현 연구)

  • Song, JinGyo;Kim, YoungBeom;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.463-471
    • /
    • 2021
  • Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.

A Survey on 5G Enabled Multi-Access Edge Computing for Smart Cities: Issues and Future Prospects

  • Tufail, Ali;Namoun, Abdallah;Alrehaili, Ahmed;Ali, Arshad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.107-118
    • /
    • 2021
  • The deployment of 5G is in full swing, with a significant yearly growth in the data traffic expected to reach 26% by the year and data consumption to reach 122 EB per month by 2022 [10]. In parallel, the idea of smart cities has been implemented by various governments and private organizations. One of the main objectives of 5G deployment is to help develop and realize smart cities. 5G can support the enhanced data delivery requirements and the mass connection requirements of a smart city environment. However, for specific high-demanding applications like tactile Internet, transportation, and augmented reality, the cloud-based 5G infrastructure cannot deliver the required quality of services. We suggest using multi-access edge computing (MEC) technology for smart cities' environments to provide the necessary support. In cloud computing, the dependency on a central server for computation and storage adds extra cost in terms of higher latency. We present a few scenarios to demonstrate how the MEC, with its distributed architecture and closer proximity to the end nodes can significantly improve the quality of services by reducing the latency. This paper has surveyed the existing work in MEC for 5G and highlights various challenges and opportunities. Moreover, we propose a unique framework based on the use of MEC for 5G in a smart city environment. This framework works at multiple levels, where each level has its own defined functionalities. The proposed framework uses the MEC and introduces edge-sub levels to keep the computing infrastructure much closer to the end nodes.

High-Speed Implementation to CHAM-64/128 Counter Mode with Round Key Pre-Load Technique (라운드 키 선행 로드를 통한 CHAM-64/128 카운터 모드 고속 구현)

  • Kwon, Hyeok-dong;Jang, Kyoung-bae;Park, Jae-hoon;Seo, Hwa-jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1217-1223
    • /
    • 2020
  • The Block cipher CHAM is lightweight block cipher for low-end processors, developed by National Security Research Institute from Korea. The mode of operation is necessity for efficient operation of block cipher, among them, the counter (CTR) mode has good efficiency because it is easy to implement and supporting parallel operation. In this paper, we propose the optimized implementation for block cipher CHAM-CTR. The proposed implementation can be skipped some rounds by pre-computation. Thus it has better calculating speed than existing CHAM. Also, this implementation pre-load some of round keys to registers, before entering round functions. It makes reduced 160cycles loading time for round key load. Finally, proposed implementation achieved higher performance about 6.8%, and 4.5% for fixed-key scenario, and variable-key scenario, respectively.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

Development of the Vibration Analysis Program Applying the High-Performance Numerical Analysis Library (고성능 수치해석 라이브러리를 적용한 진동해석 프로그램 개발)

  • Ko, Dou-Hyun;Boo, Seung-Hwan
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.1
    • /
    • pp.201-209
    • /
    • 2021
  • In order to evaluate the vibrational characteristics of huge finite element models such as ships and offshore structures, it is essential to perform eigenvalue analysis and frequency response analysis. However, these analyzes necessitate excessive equipment and computation time, which require the development of a high-performance analysis program. In particular, a considerable computational analysis time is required when calculating the inverse matrix in a linear system of equations and analyzing the eigenvalue analysis. Therefore, it can be improved by applying the latest high-performance library. In this paper, the vibration analysis program that enables fast and accurate analysis was developed by applying 'PARDISO', a parallel linear system of equation calculation library, and 'ARPACK', a high-performance eigenvalue analysis library. To verify the accuracy and efficiency of proposed method, we compare ABAQUS with proposed program using numerical examples of marine engineering.

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

  • Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.71-76
    • /
    • 2022
  • Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.

A Execution Performance Analysis of Applications using Multi-Process Service over GPU (다중 프로세스 서비스를 이용한 GPU 응용 동시 실행 성능 분석)

  • Kim, Se-Jin;Oh, Ji-Sun;Kim, Yoonhee
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.60-67
    • /
    • 2019
  • Graphical Processing Units(GPUs) achieve high performance undertaking from relatively uniformed computation in parallel. The technology related to General Purpose GPU(GPGPU) has been enhanced, which provides concurrent kernel execution of multi and diverse applications at the same time, but it is still limited to support resource sharing or planning. NVIDIA recently introduces Multi-Process Service(MPS), which allows kernels from different applications can be execute concurrently. However, the strength of MPS comes along with the characteristics of applications and the order of their execution. This paper shows the performance analysis of diverse scientific applications in real world. Based on the analysis, we prove that it is important to the identify characteristics of co-run applications, and to schedule multiple applications via profiling to maximize MPS functionality.

Distributed AI Learning-based Proof-of-Work Consensus Algorithm (분산 인공지능 학습 기반 작업증명 합의알고리즘)

  • Won-Boo Chae;Jong-Sou Park
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.1-14
    • /
    • 2022
  • The proof-of-work consensus algorithm used by most blockchains is causing a massive waste of computing resources in the form of mining. A useful proof-of-work consensus algorithm has been studied to reduce the waste of computing resources in proof-of-work, but there are still resource waste and mining centralization problems when creating blocks. In this paper, the problem of resource waste in block generation was solved by replacing the relatively inefficient computation process for block generation with distributed artificial intelligence model learning. In addition, by providing fair rewards to nodes participating in the learning process, nodes with weak computing power were motivated to participate, and performance similar to the existing centralized AI learning method was maintained. To show the validity of the proposed methodology, we implemented a blockchain network capable of distributed AI learning and experimented with reward distribution through resource verification, and compared the results of the existing centralized learning method and the blockchain distributed AI learning method. In addition, as a future study, the thesis was concluded by suggesting problems and development directions that may occur when expanding the blockchain main network and artificial intelligence model.

Efficient CHAM-Like Structures on General-Purpose Processors with Changing Order of Operations (연산 순서 변경에 따른 범용 프로세서에서 효율적인 CHAM-like 구조)

  • Myoungsu Shin;Seonkyu Kim;Hanbeom Shin;Insung Kim;Sunyeop Kim;Donggeun Kwon;Deukjo Hong;Jaechul Sung;Seokhie Hong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.4
    • /
    • pp.629-639
    • /
    • 2024
  • CHAM is designed with an emphasis on encryption speed, considering that in the ISO/IEC standard block cipher operation mode, encryption functions are used more often than decryption functions. In the superscalar architecture of modern general-purpose processors, different ordering of operations can lead to different processing speeds, even if the computation configuration is the same. In this paper, we analyze the implementation efficiency and security of CHAM-like structures, which rearrange the order of operations in the ARX-based block cipher CHAM, for single-block and parallel implementations in a general-purpose processor environment. The proposed structures are at least 9.3% and at most 56.4%efficient in terms of encryption speed. The security analysis evaluates the resistance of the CHAM-like structures to differential and linear attacks. In terms of security margin, the difference is 3.4% for differential attacks and 6.8%for linear attacks, indicating that the security strength is similar compared to the efficiency difference. These results can be utilized in the design of ARX-based block ciphers.