• Title/Summary/Keyword: 연산 지도

Search Result 4,007, Processing Time 0.028 seconds

Design and Implementation of High-Performance Cryptanalysis System Based on GPUDirect RDMA (GPUDirect RDMA 기반의 고성능 암호 분석 시스템 설계 및 구현)

  • Lee, Seokmin;Shin, Youngjoo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.6
    • /
    • pp.1127-1137
    • /
    • 2022
  • Cryptographic analysis and decryption technology utilizing the parallel operation of GPU has been studied in the direction of shortening the computation time of the password analysis system. These studies focus on optimizing the code to improve the speed of cryptographic analysis operations on a single GPU or simply increasing the number of GPUs to enhance parallel operations. However, using a large number of GPUs without optimization for data transmission causes longer data transmission latency than using a single GPU and increases the overall computation time of the cryptographic analysis system. In this paper, we investigate GPUDirect RDMA and related technologies for high-performance data processing in deep learning or HPC research fields in GPU clustering environments. In addition, we present a method of designing a high-performance cryptanalysis system using the relevant technologies. Furthermore, based on the suggested system topology, we present a method of implementing a cryptanalysis system using password cracking and GPU reduction. Finally, the performance evaluation results are presented according to demonstration of high-performance technology is applied to the implemented cryptanalysis system, and the expected effects of the proposed system design are shown.

Performance Improvements of SCAM Climate Model using LAPACK BLAS Library (SCAM 기상모델의 성능향상을 위한 LAPACK BLAS 라이브러리의 활용)

  • Dae-Yeong Shin;Ye-Rin Cho;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.1
    • /
    • pp.33-40
    • /
    • 2023
  • With the development of supercomputing technology and hardware technology, numerical computation methods are also being advanced. Accordingly, improved weather prediction becomes possible. In this paper, we propose to apply the LAPACK(Linear Algebra PACKage) BLAS(Basic Linear Algebra Subprograms) library to the linear algebraic numerical computation part within the source code to improve the performance of the cumulative parametric code, Unicon(A Unified Convection Scheme), which is included in SCAM(Single-Columns Atmospheric Model, simplified version of CESM(Community Earth System Model)) and performs standby operations. In order to analyze this, an overall execution structure diagram of SCAM was presented and a test was conducted in the relevant execution environment. Compared to the existing source code, the SCOPY function achieved 0.4053% performance improvement, the DSCAL function 0.7812%, and the DDOT function 0.0469%, and all of them showed a 0.8537% performance improvement. This means that the LAPACK BLAS application method, a library for high-density linear algebra operations proposed in this paper, can improve performance without additional hardware intervention in the same CPU environment.

Optimized Implementation of Lightweight Block Cipher SIMECK and SIMON Counter Operation Mode on 32-Bit RISC-V Processors (32-bit RISC-V 프로세서 상에서의 경량 블록 암호 SIMECK, SIMON 카운터 운용 모드 최적 구현)

  • Min-Joo Sim;Hyeok-Dong Kwon;Yu-Jin Oh;Min-Ho Song;Hwa-Jeong Seo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.2
    • /
    • pp.165-173
    • /
    • 2023
  • In this paper, we propose an optimal implementation of lightweight block ciphers, SIMECK and SIMON counter operation mode, on a 32-bit RISC-V processor. Utilizing the characteristics of the CTR operating mode, we propose round function optimization that precomputes some values, single plaintext optimization and two plaintext parallel optimization. Since there are no previous research results on SIMECK and SIMON on RISC-V, we compared the performance of implementations with and without precomputation techniques for single plaintext optimization and two plaintext parallel optimization implementations. As a result, the implementations to which the precomputation technique was applied showed a performance improvement of 1% compared to the implementations to which precomputation was not applied.

Improvement in Inefficient Repetition of Gauss Sieve (Gauss Sieve 반복 동작에서의 비효율성 개선)

  • Byeongho Cheon;Changwon Lee;Chanho Jeon;Seokhie Hong;Suhri Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.2
    • /
    • pp.223-233
    • /
    • 2023
  • Gauss Sieve is an algorithm for solving SVP and requires exponential time and space complexity. The terminationcondition of the Sieve is determined by the size of the constructed list and the number of collisions related to space complexity. The term 'collision' refers to the state in which the sampled vector is reduced to the vector that is already inthe list. if collisions occur more than a certain number of times, the algorithm terminates. When executing previous algorithms, we noticed that unnecessary operations continued even after the shortest vector was found. This means that the existing termination condition is set larger than necessary. In this paper, after identifying the point where unnecessary operations are repeated, optimization is performed on the number of operations required. The tests are conducted by adjusting the threshold of the collision that becomes the termination condition and the distribution in whichthe sample vector is generated. According to the experiments, the operation that occupies the largest proportion decreased by62.6%. The space and time complexity also decreased by 4.3 and 1.6%, respectively.

Time-optimized Color Conversion based on Multi-mode Chrominance Reconstruction and Operation Rearrangement for JPEG Image Decoding (JPEG 영상 복원을 위한 다중 모드 채도 복원과 연산 재배열 기반의 시간 최적화된 컬러 변환)

  • Kim, Young-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.135-143
    • /
    • 2009
  • Recently, in the mobile device, the increase of the need for encoding and decoding of high-resolution images requires an efficient implementation of the image codec. This paper proposes a time-optimized color conversion method for the JPEG decoder, which reduces the number of calculations in the color conversion by the rearrangement of arithmetic operations being possible due to the linearity of the IDCT and the color conversion matrices and brings down the time complexity of the color conversion itself by the integer mapping replacing floating-point operations to the optimal fixed-point shift and addition operations, eventually reducing the time complexity of the JPEG decoder. And the proposed method compensates a decline of image quality incurred by the quantification error of the operation arrangement and the integer mapping by using the multi-mode chrominance reconstruction. The performance evaluation performed on the development platform of embedded systems showed that, compared to previous color conversion methods, the proposed method greatly reduces the image decoding time, minimizing the distortion of decoded images.

An Optimized Hardware Design for High Performance Residual Data Decoder (고성능 잔여 데이터 복호기를 위한 최적화된 하드웨어 설계)

  • Jung, Hong-Kyun;Ryoo, Kwang-Ki
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.11
    • /
    • pp.5389-5396
    • /
    • 2012
  • In this paper, an optimized residual data decoder architecture is proposed to improve the performance in H.264/AVC. The proposed architecture is an integrated architecture that combined parallel inverse transform architecture and parallel inverse quantization architecture with common operation units applied new inverse quantization equations. The equations without division operation can reduce execution time and quantity of operation for inverse quantization process. The common operation unit uses multiplier and left shifter for the equations. The inverse quantization architecture with four common operation units can reduce execution cycle of inverse quantization to one cycle. The inverse transform architecture consists of eight inverse transform operation units. Therefore, the architecture can reduce the execution cycle of inverse transform to one cycle. Because inverse quantization operation and inverse transform operation are concurrency, the execution cycle of inverse transform and inverse quantization operation for one $4{\times}4$ block is one cycle. The proposed architecture is synthesized using Magnachip 0.18um CMOS technology. The gate count and the critical path delay of the architecture are 21.9k and 5.5ns, respectively. The throughput of the architecture can achieve 2.89Gpixels/sec at the maximum clock frequency of 181MHz. As the result of measuring the performance of the proposed architecture using the extracted data from JM 9.4, the execution cycle of the proposed architecture is about 88.5% less than that of the existing designs.

MPEG-H 3D Audio Decoder Structure and Complexity Analysis (MPEG-H 3D 오디오 표준 복호화기 구조 및 연산량 분석)

  • Moon, Hyeongi;Park, Young-cheol;Lee, Yong Ju;Whang, Young-soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.432-443
    • /
    • 2017
  • The primary goal of the MPEG-H 3D Audio standard is to provide immersive audio environments for high-resolution broadcasting services such as UHDTV. This standard incorporates a wide range of technologies such as encoding/decoding technology for multi-channel/object/scene-based signal, rendering technology for providing 3D audio in various playback environments, and post-processing technology. The reference software decoder of this standard is a structure combining several modules and can operate in various modes. Each module is composed of independent executable files and executed sequentially, real time decoding is impossible. In this paper, we make DLL library of the core decoder, format converter, object renderer, and binaural renderer of the standard and integrate them to enable frame-based decoding. In addition, by measuring the computation complexity of each mode of the MPEG-H 3D-Audio decoder, this paper also provides a reference for selecting the appropriate decoding mode for various hardware platforms. As a result of the computational complexity measurement, the low complexity profiles included in Korean broadcasting standard has a computation complexity of 2.8 times to 12.4 times that of the QMF synthesis operation in case of rendering as a channel signals, and it has a computation complexity of 4.1 times to 15.3 times of the QMF synthesis operation in case of rendering as a binaural signals.

Fast Hologram Generating of 3D Object with Super Multi-Light Source using Parallel Distributed Computing (병렬 분산 컴퓨팅을 이용한 초다광원 3차원 물체의 홀로그램 고속 생성)

  • Song, Joongseok;Kim, Changseob;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.706-717
    • /
    • 2015
  • The computer generated hologram (CGH) method is the technology which can generate a hologram by using only a personal computer (PC) commonly used. However, the CGH method requires a huge amount of calculational time for the 3D object with a super multi-light source or a high-definition hologram. Hence, some solutions are obviously necessary for reducing the computational complexity of a CGH algorithm or increasing the computing performance of hardware. In this paper, we propose a method which can generate a digital hologram of the 3D object with a super multi-light source using parallel distributed computing. The traditional methods has the limitation of improving CGH performance by using a single PC. However, the proposed method where a server PC efficiently uses the computing power of client PCs can quickly calculate the CGH method for 3D object with super multi-light source. In the experimental result, we verified that the proposed method can generate the digital hologram with 1,5361,536 resolution size of 3D object with 157,771 light source in 121 ms. In addition, in the proposed method, we verify that the proposed method can reduce generation time of a digital hologram in proportion to the number of client PCs.

Selectivity Estimation for Spatio-Temporal a Overlap Join (시공간 겹침 조인 연산을 위한 선택도 추정 기법)

  • Lee, Myoung-Sul;Lee, Jong-Yun
    • Journal of KIISE:Databases
    • /
    • v.35 no.1
    • /
    • pp.54-66
    • /
    • 2008
  • A spatio-temporal join is an expensive operation that is commonly used in spatio-temporal database systems. In order to generate an efficient query plan for the queries involving spatio-temporal join operations, it is crucial to estimate accurate selectivity for the join operations. Given two dataset $S_1,\;S_2$ of discrete data and a timestamp $t_q$, a spatio-temporal join retrieves all pairs of objects that are intersected each other at $t_q$. The selectivity of the join operation equals the number of retrieved pairs divided by the cardinality of the Cartesian product $S_1{\times}S_2$. In this paper, we propose aspatio-temporal histogram to estimate selectivity of spatio-temporal join by extending existing geometric histogram. By using a wide spectrum of both uniform dataset and skewed dataset, it is shown that our proposed method, called Spatio-Temporal Histogram, can accurately estimate the selectivity of spatio-temporal join. Our contributions can be summarized as follows: First, the selectivity estimation of spatio-temporal join for discrete data has been first attempted. Second, we propose an efficient maintenance method that reconstructs histograms using compression of spatial statistical information during the lifespan of discrete data.

A Real-time Motion Object Detection based on Neighbor Foreground Pixel Propagation Algorithm (주변 전경 픽셀 전파 알고리즘 기반 실시간 이동 객체 검출)

  • Nguyen, Thanh Binh;Chung, Sun-Tae
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.1
    • /
    • pp.9-16
    • /
    • 2010
  • Moving object detection is to detect foreground object different from background scene in a new incoming image frame and is an essential ingredient process in some image processing applications such as intelligent visual surveillance, HCI, object-based video compression and etc. Most of previous object detection algorithms are still computationally heavy so that it is difficult to develop real-time multi-channel moving object detection in a workstation or even one-channel real-time moving object detection in an embedded system using them. Foreground mask correction necessary for a more precise object detection is usually accomplished using morphological operations like opening and closing. Morphological operations are not computationally cheap and moreover, they are difficult to be rendered to run simultaneously with the subsequent connected component labeling routine since they need quite different type of processing from what the connected component labeling does. In this paper, we first devise a fast and precise foreground mask correction algorithm, "Neighbor Foreground Pixel Propagation (NFPP)" which utilizes neighbor pixel checking employed in the connected component labeling. Next, we propose a novel moving object detection method based on the devised foreground mask correction algorithm, NFPP where the connected component labeling routine can be executed simultaneously with the foreground mask correction. Through experiments, it is verified that the proposed moving object detection method shows more precise object detection and more than 4 times faster processing speed for a image frame and videos in the given the experiments than the previous moving object detection method using morphological operations.