• Title/Summary/Keyword: multi-core CPU

Search Result 76, Processing Time 0.028 seconds

Comparison of Parallel Computation Performances for 3D Wave Propagation Modeling using a Xeon Phi x200 Processor (제온 파이 x200 프로세서를 이용한 3차원 음향 파동 전파 모델링 병렬 연산 성능 비교)

  • Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.4
    • /
    • pp.213-219
    • /
    • 2018
  • In this study, we simulated 3D wave propagation modeling using a Xeon Phi x200 processor and compared the parallel computation performance with that using a Xeon CPU. Unlike the 1st generation Xeon Phi coprocessor codenamed Knights Corner, the 2nd generation x200 Xeon Phi processor requires no additional communication between the internal memory and the main memory since it can run an operating system directly. The Xeon Phi x200 processor can run large-scale computation independently, with the large main memory and the high-bandwidth memory. For comparison of parallel computation, we performed the modeling using the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) libraries. Numerical examples using the SEG/EAGE salt model demonstrated that we can achieve 2.69 to 3.24 times faster modeling performance using the Xeon Phi with a large number of computational cores and high-bandwidth memory compared to that using the 12-core CPU.

Implementation of a parallel traversal scheme for O(n!) search space exploiting cost constraint (비용 제약조건을 이용한 병렬 O(n!) 서치 스페이스 탐색 기법의 구현)

  • Lee, Junghoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.1501-1502
    • /
    • 2010
  • DualCore 혹은 MultiCore 플랫폼의 보급에 따라 높은 시간복잡도를 갖는 응용들도 사용자의 컴퓨터나 단말에서 수행되어 다양한 서비스를 제공할 수 있게 되었다. 본 논문에서는 관광 스케줄을 효율적으로 결정하기 위한 다중목적지 방문 문제에 대해 이중 쓰레드에 기반한 서치 스페이스 탐색 알고리즘을 구현한다. 이는 Traveling Salesman Problem의 한 종류로서 O(n!) 시간 복잡도를 갖고 있으며 검색시의 독립성때문에 각 쓰레드는 병렬적으로 최적의 스케줄을 탐색할 수 있다. 또 현재까지 발견된 최적값을 기반으로 부분 경로의 비용이 이미 최적값을 넘는 경우는 하위 탐색을 제거하여 상당한 성능의 향상을 가져온다. 2.4 GHz Intel(R) Core DuoCPU와 3 GB 메모리로 구성된 플랫폼 상에서 구현된 서비스는 11개의 목적지에 대한 방문 스케줄을 생성함에 있어서 단일 쓰레드 버전은 14.196초, 이중 쓰레드 버전은 6.411초, 제약조건을 포함한 이중 쓰레드 버전은 0.14초에 최적의 스케줄을 찾아낼 수 있다.

Empirical Performance Evaluation of Tree-based Indexes on Multi-Core Processors (멀티코어 프로세서에서의 트리 기반 인덱스 성능 실험 평가)

  • Kim, Kyung-Hwa;Shim, Jun-Ho;Lee, Ig-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.134-138
    • /
    • 2007
  • 점차 더 벌어지는 CPU 속도와 메모리 속도의 차이로 인하여 메모리 접근 병목 현상이 발생하였고, 이 현상을 극복하기 위하여 캐시를 고려한 인덱스 구조에 관한 연구가 계속 되었다. 또한 최근 CPU 트렌드가 싱글 코어에서 멀티 코어로 전환점을 맞으면서 캐시메모리의 효율에 대한 중요성이 더욱 부각되었다. 본 논문은 최신 프로세서를 탑재한 시스템에서 메인 메모리 데이터베이스 시스템을 위한 인덱스 구조들의 성능을 비교 평가하고, 그 중 캐시를 고려한 트리 인덱스의 성능이 유용함을 보인다.

  • PDF

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

A Comparative Study on Performance of Open Source IDS/IPS Snort and Suricata (오픈소스 IDS/IPS Snort와 Suricata의 탐지 성능에 대한 비교 연구)

  • Seok, Jinug;Choi, Moonseok;Kim, Jimyung;Park, Jonsung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.1
    • /
    • pp.89-95
    • /
    • 2016
  • Recent growth of hacking threats and development in software and technology put Network security under threat, In addition, intrusion, malware and worm virus have been increased due to the existence of variety of sophisticated hacking methods. The goal of this study is to compare Snort Alpha version with Suricata 2.0.11 version whereas previous study focuses on comparison between snort 2. x version under thread environment and Suricata under multi-threading environment. This thesis' experiment environment is set as followed. Intel (R) Core (TM) i5-4690 3. 50GHz (4threads) of CPU, 16GB of RAM, 3TB of Seagate HDD, Ubuntu 14.04 are used. According to the result, Snort Alpha version is superior to Suricata in performance, but Snort Alpha had some glitches when executing pcap files which created core dump errors. Therefore this experiment seeks to analyze which performs better between Snort Alpha version that supports multi packet processing threads and Suricata that supports multi-threading. Through this experiment, one can expect the better performance of beta and formal version of Snort in the future.

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Implementation of OpenVG Accelerator based on Multi-Core GP-GPU (멀티코어 GP-GPU 기반의 OpenVG 가속기 구현)

  • Lee, Kwang-Yeob;Park, Jong-Il;Lee, Chan-Ho
    • Journal of IKEEE
    • /
    • v.15 no.3
    • /
    • pp.248-254
    • /
    • 2011
  • Recently, processing burden of CPU is growing because of graphical user interface according to enhance the performance of mobile devices and various graphical effects and creation of contents with 3D graphical effect or Flash animation. Therefore, the GPU are introduced to mobile device for support to variety contents. In this paper, OpenVG accelerator was implemented based on multi-core GP-GPU. OpenVG accelerator is verified using the sample image provided by Khronos group, and overall function is processed by only instruction set without dedicate hardware. The performance of processing the Tiger Image was 2 frames/sec.

CALPUFF Module Acceleration with OpenMP (OpenMP를 이용한 CALPUFF 모듈 가속화)

  • Yu, Suk-Hyun;Yang, Jin-Uk;Kim, Kyung-Ho;Youn, Hee-Young;Koo, Youn-Seo;Kwon, Hee-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.1-4
    • /
    • 2011
  • 악취 유발 사업장 및 지자체에서 사용하고 있는 악취 관리 모델링 시스템의 핵심 모듈을 최근 Intel에서 발표한 멀티코어(multi-core) 기술과 OpenMP 기술을 이용하여 고성능 병렬처리에 의한 실시간 시스템으로 개선하였다. 기존의 기상 모델인 CALMET 모델과 대기질 모델인 CALPUFF 모델은 배출원 갯수와 모델링 영역의 격자 갯수 증가에 따라 모델링 수행 시간이 기하급수적으로 증가한다. 악취는 그 특성상 모델링 수행시간을 짧게 할수록 악취모델링 결과를 효과적으로 사용할 수 있다. 따라서 모델링 수행시간을 단축하기 위해 여러 개의 CPU Core를 동시에 사용하여 병렬로 작업을 처리하는 멀티코어 기술을 접목하여, 기존의 CALPUFF를 실시간 모델링이 가능한 고성능 모델링 시스템으로 개발하였다. 실험 결과 Core의 수가 증가하면 Amdahl의 법칙에 준하여 가속화되었다.

AN EFFICIENT INCOMPRESSIBLE FREE SURFACE FLOW SIMULATION USING GPU (GPU를 이용한 효율적인 비압축성 자유표면유동 해석)

  • Hong, H.E.;Ahn, H.T.;Myung, H.J.
    • Journal of computational fluids engineering
    • /
    • v.17 no.2
    • /
    • pp.35-41
    • /
    • 2012
  • This paper presents incompressible Navier-Stokes solution algorithm for 2D Free-surface flow problems on the Cartesian mesh, which was implemented to run on Graphics Processing Units(GPU). The INS solver utilizes the variable arrangement on the Cartesian mesh, Finite Volume discretization along Constrained Interpolation Profile-Conservative Semi-Lagrangian(CIP-CSL). Solution procedure of incompressible Navier-Stokes equations for free-surface flow takes considerable amount of computation time and memory space even in modern multi-core computing architecture based on Central Processing Units(CPUs). By the recent development of computer architecture technology, Graphics Processing Unit(GPU)'s scientific computing performance outperforms that of CPU's. This paper focus on the utilization of GPU's high performance computing capability, and presents an efficient solution algorithm for free surface flow simulation. The performance of the GPU implementations with double precision accuracy is compared to that of the CPU code using an representative free-surface flow problem, namely. dam-break problem.