• Title/Summary/Keyword: Multi-Core Processor

Search Result 131, Processing Time 0.024 seconds

An Efficient Cache Coherence Protocol for Multi-Core Processors with Ring Interconnects (링 연결구조 기반의 멀티코어 프로세서를 위한 캐시 일관성 유지 기법)

  • Park, Jin-Young;Choi, Lynn
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.8
    • /
    • pp.768-772
    • /
    • 2008
  • Today's microprocessor normally includes several processing cores to reduce the energy consumption without losing performance. In this paper, data transfer ordering mechanism can be efficiently used for cache coherence solution in unidirectional ring interconnect. RING-DATA ORDER combines the simplicity of GREEDY-ORDER and the performance of RING-ORDER. RING-DATA ORDER can be easily applicable to multicore processor with unidirectional ring interconnect.

The study on the Efficient methodology to apply the GPU for military information system improvement (국방정보시스템 성능향상을 위한 효율적인 GPU적용방안 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.1
    • /
    • pp.27-35
    • /
    • 2015
  • Increasing the number of GPU (Graphic Processor Unit) cores, the studies on High Performance Computing Platform using GPU have actively been made in recent. This trend has led to the development of GPGPU (General Purpose GPU) and CUDA (Compute Unified Device Architecture) Framework. In this paper, we explain the many benefits of the GPU based system, and propose the ICIDF(Identify Compute-Intensive Data set and Function) methodology to apply GPU technology to legacy military information system for performance improvement. To demonstrate the efficiency of this methodology, we applied this method to AES CPU based program obtained from the Internet web site. Simply changing the data structure made improved the performance of AES program. As a result, the performance of AES based GPU program is improved gradually up to 10 times. Depending on the developer's ability, additional performance improvement can be expected. The problem to be solved is heat issue, but this problem has been much improved by the development of the cooling technology.

Thermal Management for Multi-core Processor and Prototyping Thermal-aware Task Scheduler (멀티 코어 프로세서의 온도관리를 위한 방안 연구 및 열-인식 태스크 스케줄링)

  • Choi, Jeong-Hwan
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.354-360
    • /
    • 2008
  • Power-related issues have become important considerations in current generation microprocessor design. One of these issues is that of elevated on-chip temperatures. This has an adverse effect on cooling cost and, if not addressed suitably, on chip reliability. In this paper we investigate the general trade-offs between temporal and spatial hot spot mitigation schemes and thermal time constants, workload variations and microprocessor power distributions. By leveraging spatial and temporal heat slacks, our schemes enable lowering of on-chip unit temperatures by changing the workload in a timely manner with Operating System (OS) and existing hardware support.

A Design of Memory-efficient 2k/8k FFT/IFFT Processor using R4SDF/R4SDC Hybrid Structure (R4SDF/R4SDC Hybrid 구조를 이용한 메모리 효율적인 2k/8k FFT/IFFT 프로세서 설계)

  • 신경욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.2
    • /
    • pp.430-439
    • /
    • 2004
  • This paper describes a design of 8192/2048-point FFT/IFFT processor (CFFT8k2k), which performs multi-carrier modulation/demodulation in OFDM-based DVB-T receiver. Since a large size FFT requires a large buffer memory, two design techniques are considered to achieve memory-efficient implementation of 8192-point FFT/IFFT. A hybrid structure, which is composed of radix-4 single-path delay feedback (R4SDF) and radix-4 single-path delay commutator (R4SDC), reduces its memory by 20% compared to R4SDC structure. In addition, a memory reduction of about 24% is achieved by a novel two-step convergent block floating-point scaling. As a result, it requires only 57% of memory used in conventional design, reducing chip area and power consumption. The CFFT8k2k core is designed in Verilog-HDL, and has about 102,000 Bates, RAM of 292k bits, and ROM of 39k bits. Using gate-level netlist with SDF which is synthesized using a $0.25-{\um}m$ CMOS library, timing simulation show that it can safely operate with 50-MHz clock at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-${\mu}\textrm{s}$. The functionality of the core is fully verified by FPGA implementation, and the average SQNR of 60-㏈ is achieved.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Efficient 3D Modeling of CSEM Data (인공송신원 전자탐사 자료의 효율적인 3차원 모델링)

  • Jeong, Yong-Hyeon;Son, Jeong-Sul;Lee, Tae-Jong
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 2009.10a
    • /
    • pp.75-80
    • /
    • 2009
  • Despite its flexibility to complex geometry, three-dimensional (3D) electromagnetic(EM) modeling schemes using finite element method (FEM) have been faced to practical limitation due to the resulting large system of equations to be solved. An efficient 3D FEM modeling scheme has been developed, which can adopt either direct or iterative solver depending on the problems. The direct solver PARDISO can reduce the computing time remarkably by incorporating parallel computing on multi-core processor systems, which is appropriate for single frequency multi-source configurations. When limited memory, the iterative solver BiCGSTAB(1) can provide fast and stable convergence. Efficient 3D simulations can be performed by choosing an optimum solver depending on the computing environment and the problems to be solved. This modeling includes various types of controlled-sources and can be exploited as an efficient engine for 3D inversion.

  • PDF

Synchronized Sampling Structure applied HW/SW platform for LAN-based Digital Substation Protection (LAN 기반 디지털 변전소 보호를 위한 동기 샘플링 구조적용 HW/SW 플랫폼 기술)

  • Son, Kyou Jung;Nam, Kyung-Deok;An, Gi Sung;Chang, Tae Gyu
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.178-185
    • /
    • 2020
  • This paper proposes precise time synchronization-based synchronized sampling structure applied HW/SW platform for LAN-based protection of future digital substations. The integrated software of the proposed platform includes IEC 61850 protocol, IEEE 1588 precision time protocol and synchronized sampling structure. The proposed platform expected to provide a basis of an application of future distributed sensing data-based protection and control methods by providing synchronized measurement among IEDs. The implementation of the proposed HW/SW platform technique was performed using TMDXIDK572 multi-core/multi-processor evaluation module and its time synchronization performance and synchronized sampling function were confirmed through the performance tests.

Thermal Pattern Comparison between 2D Multicore Processors and 3D Multicore Processors (2차원 구조와 3차원 구조에 따른 멀티코어 프로세서의 온도 분석)

  • Choi, Hong-Jun;Ahn, Jin-Woo;Jang, Hyung-Beom;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.9
    • /
    • pp.1-10
    • /
    • 2011
  • Unfortunately, in current microprocessors, increasing the frequency causes increased power consumption and reduced reliability whereas it improves the performance. To overcome the power and thermal problems in the singlecore processors, multicore processors has been widely used. For 2D multicore processors, interconnection is regarded as one of the major constraints in performance and power efficiency. To reduce the performance degradation and the power consumption in 2D multicore processors, 3D integrated design technique has been studied by many researchers. Compared to 2D multicore processors, 3D multicore processors get the benefits of performance improvement and reduced power consumption by reducing the wire length significantly. However, 3D multicore processors have serious thermal problems due to high power density, resulting in reliability degradation. Detailed thermal analysis for multicore processors can be useful in designing thermal-aware processors. In this paper, we analyze the impact of workload distribution, distance to the heat sink, and number of stacked dies on the processor temperature. We also analyze the effects of the temperature on overall system performance. Especially, this paper presents the guideline for thermal-aware multicore processor design by analyzing the thermal problems in 2D multicore processors and 3D multicore processors.

Real-Time Kernel for Linux based on ARM Processor, RTiKA (Real-Time Implant Kernel For ARMLinux) (ARM 프로세서 기반의 리눅스를 위한 실시간 확장 커널 (RTiKA, Real-Time implant Kernel for ARMLinux))

  • Lee, Seung-Yul;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.587-597
    • /
    • 2017
  • Recently, the demand for real-time performance in mobile environment is increasing due to the improvement of hardware performance, however a GPOS(General-Purpose Operating System) such as Android and Linux do not provide real-time performance. We developed RTiK(Real-Time implant Kernel) for this problem, but it has the disadvantage of supporting only x86 Architecture. In this paper, we designed and implemented a RTiKA(Real-Time implanted Kernel for ARM) to support real-time in ARM Linux. We used MCT(Multi-Core Timer) timer which replaces Local APIC Timer for real-time support, and we measured the period of generated real-time task for performance verification and evaluation. As the recent the RTiKA can guarantee the operating of several real-time tasks based on the cycle of 1ms.

Computing Performance Comparison of CPU and GPU Parallelization for Virtual Heart Simulation (가상 심장 시뮬레이션에서 CPU와 GPU 병렬처리의 계산 성능 비교)

  • Kim, Sang Hee;Jeong, Da Un;Setianto, Febrian;Lim, Ki Moo
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.3
    • /
    • pp.128-137
    • /
    • 2020
  • Cardiac electrophysiology studies often use simulation to predict how cardiac will behave under various conditions. To observe the cardiac tissue movement, it needs to use the high--resolution heart mesh with a sophisticated and large number of nodes. The higher resolution mesh is, the more computation time is needed. To improve computation speed and performance, parallel processing using multi-core processes and network computing resources is performed. In this study, we compared the computational speeds of CPU parallelization and GPU parallelization in virtual heart simulation for efficiently calculating a series of ordinary differential equations (ODE) and partial differential equations (PDE) and determined the optimal CPU and GPU parallelization architecture. We used 2D tissue model and 3D ventricular model to compared the computation performance. Then, we measured the time required to the calculation of ODEs and PDEs, respectively. In conclusion, for the most efficient computation, using GPU parallelization rather than CPU parallelization can improve performance by 4.3 times and 2.3 times in calculations of ODEs and PDE, respectively. In CPU parallelization, it is best to use the number of processors just before the communication cost between each processor is incurred.