• Title/Summary/Keyword: 멀티코어 구조

Search Result 136, Processing Time 0.026 seconds

Dynamic Core Affinity for High-Performance I/O Devices Supporting Multiple Queues (다중 큐를 지원하는 고속 I/O 장치를 위한 동적 코어 친화도)

  • Cho, Joong-Yeon;Uhm, Junyong;Jin, Hyun-Wook;Jung, Sungin
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.736-743
    • /
    • 2016
  • Several studies have reported the impact of core affinity on the network I/O performance of multi-core systems. As the network bandwidth increases significantly, it becomes more important to determine the effective core affinity. Although a framework for dynamic core affinity that considers both network and disk I/O has been suggested, the multiple queues provided by high-speed I/O devices are not properly supported. In this paper, we extend the existing framework of dynamic core affinity to efficiently support the multiple queues of high-speed I/O devices, such as 40 Gigabit Ethernet and NVM Express. Our experimental results show that the extended framework can improve the HDFS file upload throughput by up to 32%, and can provide improved scalability in terms of the number of cores. In addition, we analyze the impact of the assignment policy of multiple I/O queues across a number of cores.

Implementation of Kernel Module for Shared Memory in Dual Bus System (듀얼 버스 시스템에서의 공유 메모리 커널 모듈 구현)

  • Moon, Ji-Hoon;Oh, Jae-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.5
    • /
    • pp.539-548
    • /
    • 2015
  • In this paper, shared memory feature was developed in multi-core system with different OS for different processor-specific bus, while conducting an experiment on shared memory feature between the two processors based on embedded Linux system. For the purpose of developing shared memory in dual bus structure, memory controller was used, while managing shared memory segment through list data structure. For AMP multi-core test, Linux OS was installed in 2 processor cores. In addition, it verified the creation and use of shared memory by using kernel module implemented to test shared memory.

Group core election for efficiency of control tree in 2-layer reliable multicast (2계층 구조의 신뢰적 멀티캐스트에서 제어 트리의 효율성을 고려한 그룹 대표 결정 기법)

  • Yi, Dong-Un;Lee, Seung-Ik;Ko, Yang-Woo;Lee, Dong-Man
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10d
    • /
    • pp.223-227
    • /
    • 2006
  • 인터넷의 발전에 따라 다자간 그룹 통신 환경이 주목을 받게 되었고 이를 위해 다대다 멀티캐스트를 이용한 통신 기법들이 제안되었다. 이들 중 GAM (Group Aided Multicast) 은 다대다 멀티캐스트의 신뢰적 전달을 보장하기 위해 트리 기반의 손실 복구 기법을 제안하고, 제어 트리 관리 비용과 손실 복구 효율성간의 접점을 찾기 위해 그룹 개념을 도입하고 그룹의 대표로서 코어 노드를 정의한다. 코어 노드는 네트워크 상의 특정 지점에 미리 설치된 전용 노드로서 그룹 내의 손실 복구와 그룹 간의 손실 제어를 담당한다. 그러나 이러한 전용 코어 노드의 도입은 프로토콜의 가용성을 제한하고 코어 노드의 설치 부담을 가지게 된다. 따라서 본 논문에서는 별도의 코어 노드의 설치 없이 프로토콜이 동작할 수 있도록 세션 참가자 중에서 코어 노드를 동적으로 선택하고, 로컬 그룹 노드 중에서 제어 트리의 효율성을 최대한 보장하는 노드의 위치를 결정하는 기법을 제안한다.

  • PDF

Construction of an Automatic Generation System of Embedded Processor Cores (임베디드 프로세서 코어 자동생성 시스템의 구축)

  • Cho Jae-Bum;You Yong-Ho;Hwang Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6A
    • /
    • pp.526-534
    • /
    • 2005
  • This paper presents the structure and function of the system which automatically generates embedded processor cores using the SMDL. Accepting processor description in the SDML, the proposed system generates the processor core, consisting of the pipelined datapath and memory modules together with their control unit. The generated cores support muti-cycle instructions for proper handling of memory accesses, and resolve pipeline hazards encountered in the pipelined processors. Experimental results show the functional accuracy of the generated cores.

Dynamic Directory Table: On-Demand Allocation of Directory Entries for Active Shared Cache Blocks (동적 디렉터리 테이블 : 공유 캐시 블록의 디렉터리 엔트리 동적 할당)

  • Bae, Han Jun;Choi, Lynn
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1245-1251
    • /
    • 2017
  • In this study we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than one core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is implemented as a cache of active directory entries. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 17.84% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

Accelerated Large-Scale Simulation on DEVS based Hybrid System using Collaborative Computation on Multi-Cores and GPUs (멀티 코어와 GPU 결합 구조를 이용한 DEVS 기반 대규모 하이브리드 시스템 모델링 시뮬레이션의 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.3
    • /
    • pp.1-11
    • /
    • 2018
  • Discrete event system specification (DEVS) has been used in many simulations including hybrid systems featuring both discrete and continuous behavior that require a lot of time to get results. Therefore, in this study, we proposed the acceleration of a DEVS-based hybrid system simulation using multi-cores and GPUs tightly coupled computing. We analyzed the proposed heterogeneous computing of the simulation in terms of the configuration of the target device, changing simulation parameters, and power consumption for efficient simulation. The result revealed that the proposed architecture offers an advantage for high-performance simulation in terms of execution time, although more power consumption is required. With these results, we discovered that our approach is applicable in hybrid system simulation, and we demonstrated the possibility of optimized hardware distribution in terms of power consumption versus execution time via experiments in the proposed architecture.

Performance Analysis on Next-Generation Web Browser at Multicore CPU and GPU (멀티 코어와 GPU가 차세대 웹 브라우저의 성능에 미치는 영향 분석)

  • Hong, Gyeong-Hwan;Kim, Dae-Ho;Shin, Dong-Kun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.355-357
    • /
    • 2012
  • 차세대 웹 브라우저는 멀티 쓰레드(multi-thread) 구조로 되어 있으며 HTML5와 WebGL을 기반으로 화려한 그래픽을 구사하기 때문에, 멀티 코어(multi-core) CPU와 GPU의 성능이 웹 브라우저의 성능에 큰 영향을 미치고 있다. 본 논문은 오픈 소스 웹 브라우저인 크로미엄(Chromium) 상에서 프로세서의 성능 변화에 따라 웹 브라우저에서 실행되는 웹 어플리케이션의 성능이 어떤 양상으로 변화하는지와 이 변화에 웹 브라우저의 각 동작이 얼마나 기여하는지를 비교 분석하였다. 그 결과 CPU 코어의 수가 렌더링 성능에 큰 영향을 주며, GPU의 성능은 WebGL의 성능을 크게 좌우함을 알 수 있었다.

Exploration of Optimal Multi-Core Processor Architecture for Physical Modeling of Plucked-String Instruments (현악기의 물리적 모델링을 위한 최적의 멀티코어 프로세서 아키텍처 탐색)

  • Kang, Myeong-Su;Choi, Ji-Won;Kim, Yong-Min;Kim, Jong-Myon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.5
    • /
    • pp.281-294
    • /
    • 2011
  • Physics-based sound synthesis usually requires high computational costs and this results in a restriction of its use in real-time applications. This motivates us to implement the sound synthesis algorithm of plucked-string instruments using multi-core processor architectures and determine the optimal processing element (PE) configuration for the target instruments. To determine the optimal PE configuration, we evaluate the impacts of a sample-per-processing element (SPE) ratio that is defined as the amount of sample data directly mapped to each PE on system performance and both area and energy efficiencies using architectural and workload simulations. For the acoustic guitar, the highest area and energy efficiencies are achieved at a SPE ratio of 5,513 and 2,756, respectively, for the synthesis of musical sounds sampled at 44.1 kHz. In the case of the classical guitar, the maximum area and energy efficiencies are achieved at a SPE ratio of 22,050 and 5,513, respectively. In addition, the synthetic sounds were very similar to original sounds in their spectra. Furthermore, we conducted MUSHRA subjective listening test with ten subjects including nine graduate students and one professor from the University of Ulsan, and the evaluation of the synthetic sounds was excellent.

HD-Tree: High performance Lock-Free Nearest Neighbor Search KD-Tree (HD-Tree: 고성능 Lock-Free NNS KD-Tree)

  • Lee, Sang-gi;Jung, NaiHoon
    • Journal of Korea Game Society
    • /
    • v.20 no.5
    • /
    • pp.53-64
    • /
    • 2020
  • Supporting NNS method in KD-Tree algorithm is essential in multidimensional data applications. In this paper, we propose HD-Tree, a high-performance Lock-Free KD-Tree that supports NNS in situations where reads and writes occurs concurrently. HD-Tree reduced the number of synchronization nodes used in NNS and requires less atomic operations during Lock-Free method execution. Comparing with existing algorithms, in a multi-core system with 8 core 16 thread, HD-Tree's performance has improved up to 95% on NNS and 15% on modifying in oversubscription situation.

Design Concept and Architecture Analysis of Cell Microprocessor (Cell 마이크로프로세서 설계 개념과 아키텍쳐 분석)

  • Moon Sang-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.927-930
    • /
    • 2006
  • While Intel has been increasing its exclusive possession in the system IC semiconductor market, IBM, Sony, and Toshiba founded an alliance to develop the next entertainment multi-core processor, which is named CELL. Cell is designed upon the Power architecture and includes 8 SPE (Synergistic processor Element) cores for data handling, and supports SIMD architecture for optimal execution of multimedia, or game applications. Also, it includes expanded Power microarchitecture. In this paper, we analyzed and researched the Cell microprocessor, which is evaluated as the most powerful processor in this era.

  • PDF