• Title/Summary/Keyword: multi-core CPU

Search Result 76, Processing Time 0.027 seconds

SoC including 2M-byte on-chip SRAM and analog circuits for Miniaturization and low power consumption (소형화와 저전력화를 위해 2M-byte on-chip SRAM과 아날로그 회로를 포함하는 SoC)

  • Park, Sung Hoon;Kim, Ju Eon;Baek, Joon Hyun
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.260-263
    • /
    • 2017
  • Based on several CPU cores, an SoC including ADCs, DC-DC converter and 2M-byte SRAM is proposed in this paper. The CPU core consists of a 12-bit MENSA, a 32-bit Symmetric multi-core processor, as well as 16-bit CDSP. To eliminate the external SDRAM memory, internal 2M-byte SRAM is implemented. Because the SRAM normally occupies huge area, the parasitic components reduce the speed of SoC. In this work, the SRAM blocks are divided into small pieces to reduce the parasitic components. The proposed SoC is developed in a standard 55nm CMOS process and the speed of SoC is 200MHz.

Analysis of Job Scheduling and the Efficiency for Multi-core Mobile GPU (멀티코어형 모바일 GPU의 작업 분배 및 효율성 분석)

  • Lim, Hyojeong;Han, Donggeon;Kim, Hyungshin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.7
    • /
    • pp.4545-4553
    • /
    • 2014
  • Mobile GPU has led to the rapid development of smart phone graphic technology. Most recent smart phones are equipped with high-performance multi-core GPU. How a multi-core mobile GPU can be utilized efficiently will be a critical issue for improving the smart phone performance. On the other hand, most current research has focused on a single-core mobile GPU; studies of multi-core mobile GPU are rare. In this paper, the job scheduling patterns and the efficiency of multi-core mobile GPU are analyzed. In the profiling result, despite the higher number of GPU cores, the total processing time required for certain graphics applications were increased. In addition, when GPU is processing for 3D games, a substantial amount of overhead is caused by communication between not only the CPU and GPU, but also within the GPUs. These results confirmed that more active research for multi-core mobile GPU should be performed to optimize the present mobile GPUs.

ETS: Efficient Task Scheduler for Per-Core DVFS Enabled Multicore Processors

  • Hong, Jeongkyu
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.4
    • /
    • pp.222-229
    • /
    • 2020
  • Recent multi-core processors for smart devices use per-core dynamic voltage and frequency scaling (DVFS) that enables independent voltage and frequency control of cores. However, because the conventional task scheduler was originally designed for per-core DVFS disabled processors, it cannot effectively utilize the per-core DVFS and simply allocates tasks evenly across all cores to core utilization with the same CPU frequency. Hence, we propose a novel task scheduler to effectively utilize percore DVFS, which enables each core to have the appropriate frequency, thereby improving performance and decreasing energy consumption. The proposed scheduler classifies applications into two types, based on performance-sensitivity and allows a performance-sensitive application to have a dedicated core, which maximizes core utilization. The experimental evaluations with a real off-the-shelf smart device showed that the proposed task scheduler reduced 13.6% of CPU energy (up to 28.3%) and 3.4% of execution time (up to 24.5%) on average, as compared to the conventional task scheduler.

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.4
    • /
    • pp.73-76
    • /
    • 2014
  • With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.

Designing Hybrid Sorting Algorithm for PC with GPU (GPU가 장착된 PC를 위한 혼합 정렬 알고리즘 설계)

  • Kwon, Oh-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.281-286
    • /
    • 2011
  • Data sorting is one of important pre-process to utilize huge data in modern society, but sorting spends a lot of time by sorting itself. In this paper, we presented hybrid sorting algorithm that splits array to sort concurrently in CPU and GPU. To do this, we decided most effective range of array based on hardware performance, then accomplished reducing whole sorting time by concurrent sorting on CPU and GPU. As shown in results of experiment, hybrid sorting improved about eight percent of sorting time in comparison with the sorting time using only GPU.

Latency Evaluation of CPU Idle Time Based Interrupt Processing on Pfair Multi-Core Scheduler (Pfair 멀티코어 스케줄러에서 CPU 유휴시간 기반의 인터럽트 처리 기법의 지연시간 평가)

  • Park, Sangsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.31-32
    • /
    • 2014
  • 다중의 명령어를 동시에 수행할 수 있는 멀티코어 시스템의 특성으로 하나의 시스템 내에서 태스크를 수행하면서 외부 이벤트의 발생에 의한 인터럽트를 동시에 처리할 수 있다. 각 태스크가 처리되어야 하는 시간에 제약성을 갖는 실시간 시스템에서는 스케줄러에 의해 CPU 코어에서의 수행이 제어되어야한다. 본 논문에서는 최적이라고 알려진 Pfair 멀티코어 스케줄러의 각 코어별 유휴시간을 정량적으로 평가함으로써 인터럽트 처리의 지연시간을 분석한다.

Hybrid parallel programming for Heterogeneous Multi-core performance optimization (헤테로지니어스 멀티코어 성능 최적화를 위한 하이브리드 병렬 프로그래밍)

  • Lim, Ju-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.7-9
    • /
    • 2012
  • CPU는 싱글 코어 구조에서 클록 속도를 높여 성능을 향상 시키려는 노력을 해왔으나 한계에 도달하자 하나의 칩에 코어를 여러 개 둔 멀티코어 형태로 발전하였다. CPU의 성능 향상을 위해 이제는 3D그래픽을 연산처리하기 위해 만들어진 GPU와 결합하기에 이르렀다. CPU와 GPU의 결합은 CPU간의 결합보다 훨씬 더 좋은 성능을 보였고 전력의 사용량도 더 적었으며 비용면에서도 경제적이라는 장점을 가지고 있다. 본 논문에서는 CPU와 GPU의 Heterogeneous multicore상에서 성능을 최적화하기 위해 기존의 병렬화 모델을 조합하고 최적화를 시도하였다. CPU상에서는 성능 향상을 위해 기존의 병렬 프로그램 모델인 SIMD와 공유메모리 병렬 프로그래밍 모델 그리고 메시지 패싱 병렬 프로그래밍 모델을 조합하는 실험을 했다. GPU에서는 CUDA를 최적화 하였다. 이렇게 CPU와 GPU를 최적화하고 조합하여 고성능 연산을 요구하는 어플리케이션을 위한 Heterogeneous multicore 성능 최적화 방법을 제안한다.

An implementation of a unified ALU in multi-core GPGPU based on SIMT architecture (SIMT 구조 기반 멀티코어 GPGPU의 통합 ALU 설계)

  • Kyung, Gyu-taek;Kwak, Jae-Chang;Lee, Kwang-yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.540-543
    • /
    • 2013
  • This paper describes an implementation of a unified ALU on multi-core GPGPU based on SIMT architecture. Our unified ALU can operate conditional branch instructions, data movement instructions, integer arithmetic instructions and floating-point arithmetic instructions. Since multi-core GPGPU contains a lot of ALU for parallel processing of various types, the main point of this paper is to design the minimum size ALU by unifying similar processing of each operations on circit level. All instrunctions were tested by making a test program. And we compare this results with results of CPU operations to verify our ALU. Our unified ALU's gate size is approximately 20,000 and the maximum operation frequency is 430MHz.

  • PDF

A Technique for Fast Process Creation Based on Creation Location

  • Kim, Byung-Jin;Ahn, Young-Ho;Chung, Ki-Seok
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.283-287
    • /
    • 2011
  • Due to the proliferation of software parallelization on multi-core CPUs, the number of concurrently executing processes is rapidly increasing. Unlike processes running in a server environment, those executing in a multi-core desktop or a multi-core mobile platform have various correlations. Therefore, it is crucial to consider correlations among concurrently running processes. In this paper, we exploit the property that for a given created location in the binary image of the parent process, the average running time of child processes residing in the run-queue differs. We claim that this property can be exploited to improve the overall system performance by running processes that have a relatively short running time before those with a longer running time. Experimental results verified that the running time was actually improved by 11%.

Heterogeneous Chain-mail Model for CPU-based Volume Deformation (CPU 기반의 볼륨 변형을 위한 다형질 Chainmail 모델)

  • Lee, Sein;Kye, Heewon
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.7
    • /
    • pp.759-769
    • /
    • 2019
  • Since a surgery simulation should be able to represent the internal structure of the human body, it is advantageous to adopt volume based techniques rather than polygon based techniques. However, the volume based techniques induce large computation to deform heterogeneous volume datasets such as bones and muscles. In this study, we propose a new method to deform volume data using multi-core CPUs. By improving previous studies, the proposed method minimizes unnecessary propagation operations. Moreover, we propose an efficient task-partitioning method for volume deformation using multi-core CPUs. As a result, we can simulate the deformation of heterogeneous volume data at an interactive speed without special hardware.