• Title/Summary/Keyword: CPU 시간

Search Result 518, Processing Time 0.03 seconds

A Study on AES Extension for Large-Scale Data (대형 자료를 위한 AES 확장에 관한 연구)

  • Oh, Ju-Young;Kouh, Hoon-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.6
    • /
    • pp.63-68
    • /
    • 2009
  • In the whole information technology area, the protection of information from hacking or tapping becomes a very serious issue. Therefore, the more effective, convenient and secure methods are required to make the safe operation. Encryption algorithms are known to be computationally intensive. They consume a significant amount of computing resources such as CPU time and memory. In this paper we propose the scalable encryption scheme with four criteria, the compression of plaintext, variable size of block, selectable round and software optimization. We have tested our scheme by c++. Experimental results show that our scheme achieves the faster execution speed of encryption/decryption.

  • PDF

Parallel Computing of Large Scale FE Model based on Explicit Lagrangian FEM (외연 Lagrangian 유한요소법 기반의 대규모 유한요소 모델 병렬처리)

  • 백승훈;김승조;이민형
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.34 no.8
    • /
    • pp.33-40
    • /
    • 2006
  • A parallel computing strategy for finite element(FE) processing is described and implemented in nonlinear explicit FE code and its parallel performances are evaluated. A self-made linux-cluster supercomputer with 520 CPUs is used as a bench mark test bed. It is observed that speed-up is increased almost idealy even up to 256 CPUs for a large scale model. A communication over head and its effect on the parallel performance is also examined. Parallel performance is compare with the commercial code and developed code shows superior performance as the number of CPUs used are increased.

Novel Motion Estimation Scheme to Integer Pixel with a Search Box based on SIMD for Fast HEVC encoding (HEVC 고속 부호화를 위한 SIMD 기반 Search Box 기법의 정수 화소 단위 움직임 추정 방법)

  • Seok, Jinwuk;Kim, Younhee;Ki, MyungSeok;Kim, Hui Yong
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.11a
    • /
    • pp.203-206
    • /
    • 2015
  • 본 논문은 4K UHD 입력 영상에 대한 HEVC 고속 부호화를 위하여 대부분의 상용 CPU 및 AP 에서 사용되고 있는 SIMD (Single Instruction Mutiple Data) 명령어를 사용한 고속의 정수 화소 단위 움직임 추정 방법에 대한 연구이다. 특히, IT 기기에서의 고속 동영상 부호화를 위해 기존의 SIMD 명령어를 개량하여 동일한 CPU 실행시간에 다수의 움직임 추정을 수행할 수 있는 SIMD 명령어를 사용하여 보다 같은 실행시간에 보다 넓은 영역에 대한 움직임 벡터 탐색을 수행할 수 있도록 Search Box 기법을 새로이 개발하고 이를 토대로 기존 HEVC 에서 사용되고 있는 움직임 추정 방법에 대하여 연산시간을 줄이는 동시에 화질 열화를 최소화 시킬 수 있는 방법에 대하여 논한다.

  • PDF

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

Three-dimensional Wave Propagation Modeling using OpenACC and GPU (OpenACC와 GPU를 이용한 3차원 파동 전파 모델링)

  • Kim, Ahreum;Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.2
    • /
    • pp.72-77
    • /
    • 2017
  • We calculated 3D frequency- and Laplace-domain wavefields using time-domain modeling and Fourier transform or Laplace transform. We adopted OpenACC and GPU for an efficient parallel calculation. The OpenACC makes it easy to use GPU accelerators by adding directives in conventional C, C++, and Fortran programming languages. Accordingly, one doesn't have to learn new GPGPU programming languages such as CUDA or OpenCL to use GPU. An OpenACC program allocates GPU memory, transfers data between the host CPU and GPU devices and performs GPU operations automatically or following user-defined directives. We compared performance of 3D wave propagation modeling programs using OpenACC and GPU to that using single-core CPU through numerical tests. Results using a homogeneous model and the SEG/EAGE salt model show that the OpenACC programs are approximately 53 and 30 times faster than those using single-core CPU.

Design and Implementation of Real-Time Parallel Engine for Discrete Event Wargame Simulation (이산사건 워게임 시뮬레이션을 위한 실시간 병렬 엔진의 설계 및 구현)

  • Kim, Jin-Soo;Kim, Dae-Seog;Kim, Jung-Guk;Ryu, Keun-Ho
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.111-122
    • /
    • 2003
  • Military wargame simulation models must support the HLA in order to facilitate interoperability with other simulations, and using parallel simulation engines offer efficiency in reducing system overhead generated by propelling interoperability. However, legacy military simulation model engines process events using sequential event-driven method. This is due to problems generated by parallel processing such as synchronous reference to global data domains. Additionally. using legacy simulation platforms result in insufficient utilization of multiple CPUs even if a multiple CPU system is under use. Therefore, in this paper, we propose conversing the simulation engine to an object model-based parallel simulation engine to ensure military wargame model's improved system processing capability, synchronous reference to global data domains, external simulation time processing, and the sequence of parallel-processed events during a crash recovery. The converted parallel simulation engine is designed and implemented to enable parallel execution on a multiple CPU system (SMP).

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader is a shader stage that operate differently from other shader stage and it can be used for the calculating purpose of any data in parallel. This paper proposes a GPU-based parallel algorithm for computing sparse linear systems through conjugate gradient using an iterative method, which perform calculation on OpenGL compute shader. Basically, this sparse linear solver is used to solve large linear systems such as symmetric positive definite matrix. Four well-known matrix formats (Dense, COO, ELL and CSR) have been used for matrix storage. The performance comparison from our experimental tests using eight sparse matrices shows that GPU-based linear solving system much faster than CPU-based linear solving system with the best average computing time 0.64ms in GPU-based and 15.37ms in CPU-based.

Assessment of Parallel Computing Performance of Agisoft Metashape for Orthomosaic Generation (정사모자이크 제작을 위한 Agisoft Metashape의 병렬처리 성능 평가)

  • Han, Soohee;Hong, Chang-Ki
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.427-434
    • /
    • 2019
  • In the present study, we assessed the parallel computing performance of Agisoft Metashape for orthomosaic generation, which can implement aerial triangulation, generate a three-dimensional point cloud, and make an orthomosaic based on SfM (Structure from Motion) technology. Due to the nature of SfM, most of the time is spent on Align photos, which runs as a relative orientation, and Build dense cloud, which generates a three-dimensional point cloud. Metashape can parallelize the two processes by using multi-cores of CPU (Central Processing Unit) and GPU (Graphics Processing Unit). An orthomosaic was created from large UAV (Unmanned Aerial Vehicle) images by six conditions combined by three parallel methods (CPU only, GPU only, and CPU + GPU) and two operating systems (Windows and Linux). To assess the consistency of the results of the conditions, RMSE (Root Mean Square Error) of aerial triangulation was measured using ground control points which were automatically detected on the images without human intervention. The results of orthomosaic generation from 521 UAV images of 42.2 million pixels showed that the combination of CPU and GPU showed the best performance using the present system, and Linux showed better performance than Windows in all conditions. However, the RMSE values of aerial triangulation revealed a slight difference within an error range among the combinations. Therefore, Metashape seems to leave things to be desired so that the consistency is obtained regardless of parallel methods and operating systems.

Design and Implementation of a Hardware-based Transmission/Reception Accelerator for a Hybrid TCP/IP Offload Engine (하이브리드 TCP/IP Offload Engine을 위한 하드웨어 기반 송수신 가속기의 설계 및 구현)

  • Jang, Han-Kook;Chung, Sang-Hwa;Yoo, Dae-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.459-466
    • /
    • 2007
  • TCP/IP processing imposes a heavy load on the host CPU when it is processed by the host CPU on a very high-speed network. Recently the TCP/IP Offload Engine (TOE), which processes TCP/IP on a network adapter instead of the host CPU, has become an attractive solution to reduce the load in the host CPU. There have been two approaches to implement TOE. One is the software TOE in which TCP/IP is processed by an embedded processor and the other is the hardware TOE in which TCP/IP is processed by a dedicated ASIC. The software TOE has poor performance and the hardware TOE is neither flexible nor expandable enough to add new features. In this paper we designed and implemented a hybrid TOE architecture, in which TCP/IP is processed by cooperation of hardware and software, based on an FPGA that has two embedded processor cores. The hybrid TOE can have high performance by processing time-critical operations such as making and processing data packets in hardware. The software based on the embedded Linux performs operations that are not time-critical such as connection establishment, flow control and congestions, thus the hybrid TOE can have enough flexibility and expandability. To improve the performance of the hybrid TOE, we developed a hardware-based transmission/reception accelerator that processes important operations such as creating data packets. In the experiments the hybrid TOE shows the minimum latency of about $19{\mu}s$. The CPU utilization of the hybrid TOE is below 6 % and the maximum bandwidth of the hybrid TOE is about 675 Mbps.