• 제목/요약/키워드: Many-core architecture

검색결과 136건 처리시간 0.036초

SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구 (A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs)

  • 이종복
    • 전기학회논문지
    • /
    • 제62권2호
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.

New Thermal-Aware Voltage Island Formation for 3D Many-Core Processors

  • Hong, Hyejeong;Lim, Jaeil;Lim, Hyunyul;Kang, Sungho
    • ETRI Journal
    • /
    • 제37권1호
    • /
    • pp.118-127
    • /
    • 2015
  • The power consumption of 3D many-core processors can be reduced, and the power delivery of such processors can be improved by introducing voltage island (VI) design using on-chip voltage regulators. With the dramatic growth in the number of cores that are integrated in a processor, however, it is infeasible to adopt per-core VI design. We propose a 3D many-core processor architecture that consists of multiple voltage clusters, where each has a set of cores that share an on-chip voltage regulator. Based on the architecture, the steady state temperature is analyzed so that the thermal characteristic of each voltage cluster is known. In the voltage scaling and task scheduling stages, the thermal characteristics and communication between cores is considered. The consideration of the thermal characteristics enables the proposed VI formation to reduce the total energy consumption, peak temperature, and temperature gradients in 3D many-core processors.

휴대 장치용 기타 음 합성을 위한 매니코어 아키텍처의 디자인 공간 탐색 (Design Space Exploration of Many-Core Architecture for Sound Synthesis of Guitar on Portable Device)

  • 강명수;김종면
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2014년도 제49차 동계학술대회논문집 22권1호
    • /
    • pp.1-4
    • /
    • 2014
  • Although physical modeling synthesis is becoming more and more efficient in rich and natural high-quality sound synthesis, its high computational complexity limits its use in portable devices. This constraint motivated research of single-instruction multiple-data many-core architectures that support the tremendous amount of computations by exploiting massive parallelism inherent in physical modeling synthesis. Since no general consensus has been reached which grain sizes of many-core processors and memories provide the most efficient operation for sound synthesis, design space exploration is conducted for seven processing element (PE) configurations. To find an optimal PE configuration, each PE configuration is evaluated in terms of execution time, area and energy efficiencies. Experimental results show that all PE configurations are satisfied with the system requirements to be implemented in portable devices.

  • PDF

매니코어 프로세서 상에서 이산 웨이블릿 변환을 위한 성능 평가 및 분석 (Performance Evaluation and Analysis for Discrete Wavelet Transform on Many-Core Processors)

  • 박용훈;김종면
    • 대한임베디드공학회논문지
    • /
    • 제7권5호
    • /
    • pp.277-284
    • /
    • 2012
  • To meet the usage of discrete wavelet transform (DWT) on potable devices, this paper implements 2-level DWT using a reference many-core processor architecture and determine the optimal many-core processor. To explore the optimal many-core processor, we evaluate the impacts of a data-per-processing element ratio that is defined as the amount of data mapped directly to each processing element (PE) on system performance, energy efficiency, and area efficiency, respectively. This paper utilized five PE configurations (PEs=16, 64, 256, 1,024, and 4,096) that were implemented in 130nm CMOS technology with a 720MHz clock frequency. Experimental results indicated that maximum energy and area efficiencies were achieved at PEs=1,024. However, the system area must be limited 140mm2 and the power should not exceed 3 watts in order to implement 2-level DWT on portable devices. When we consider these restrictions, the most reasonable energy and area efficiencies were achieved at PEs=256.

래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색 (Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm)

  • 손동구;김철홍;김종면
    • 대한임베디드공학회논문지
    • /
    • 제9권1호
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

한국 전통건축 공간에 나타난 위상기하학적 특성에 관한 연구 (A Study on the Topological characteristics of the Korean Traditional Architecture)

  • 배강원;김문덕
    • 한국실내디자인학회논문집
    • /
    • 제13권6호
    • /
    • pp.74-81
    • /
    • 2004
  • Much evidence points to the fact that Korean traditional architecture has long reflected traditional Korean philosophy. If what this evidence points to Is true, there is much more insight to be gained about this connection. It is important to begin with the idea that Korean culture stemmed from Confucianism, Buddhism, and Taoism. All three share similar ideas, and this study will set out to prove that topology, an anti-Euclidean school of thought created at the end of the 19th century, shares many of the same core ideas as the three mentioned above. Transitively, if Korean traditional culture is reflected in Korean traditional architecture, and topology shares many of the same core ideas, it seems that topology should be accepted into the mainstream of architectural design. This study will aim to interpret space structure forms and space constructions of the Korean traditional architecture from the topological perspective.

초음파 영상선호의 크기 변화에 따른 최적의 매니코어 프로세서 구조 (Optimal Many-core Processor Architecture for Different Ultrasonic Image Resolutions)

  • 강성모;김종면
    • 융합신호처리학회논문지
    • /
    • 제13권1호
    • /
    • pp.50-55
    • /
    • 2012
  • 본 논문은 휴대용 초음파 진단기기에서 초음파 영상 크기 변화에 따라 요구되어지는 저전력 및 고성능을 만족시키기 위한 최적의 매니코어 프로세서 구조를 제안한다. 이를 위해 본 논문에서는 매니코어 프로세서 코어의 구조를 데이터의 크기에 따라 최대 일곱 가지의 프로세싱 엘리먼트(Processing Element, PE) 모델에서 성능 변화 및 전력 소모를 측정하였다. 모의실험 결과, 에너지 효율은 $256{\times}256$, $320{\times}240$, $800{\times}480$ 해상도를 갖는 영상에서 PE 수가 각각 1,024개, 64개, 256개 일 때 가장 높았다. 또한 $256{\times}256$$800{\times}480$ 해상도의 영상에서는 PE 수가 256개, $320{\times}240$ 해상도의 영상에서는 64개에서 가장 높은 면적 효율을 보였다.

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho;Choi, Minseok;Kwon, Youngsu
    • ETRI Journal
    • /
    • 제42권4호
    • /
    • pp.468-479
    • /
    • 2020
  • The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

시스템 재설정 및 진화를 위한 지능형 아키택처 개발 (Development of Reconfigurable and Evolvable Architecture for Intelligence Implement)

  • 나진희;안호석;박명수;최진영
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2005년도 추계학술대회 학술발표 논문집 제15권 제2호
    • /
    • pp.500-503
    • /
    • 2005
  • 대부분의 지능 알고리즘들은 환경이나 사용목적에 따라 항상 최적 성능을 보장하지는 못한다. 그러므로 다양한 알고리즘들을 구현한 후에 환경이나 사용 목적에 따라 최적인 알고리즘 또는 알고리즘의 조합을 선택하여 시스템을 구성할 수 있다면 유용할 것이다 본 논문에서는 지능형 Macro Core를 기반으로 한 시스템 재설정 및 진화를 위한 지능형 아키텍처를 제안한다. 제안한 아키텍처를 이용하면 새로운 알고리즘들의 추가와 이들을 조합하여 시스템을 구성하는 데에 드는 비용을 절약할 수 있으며, 표준화된 규격을 제시할 수 있다는 장점이 있다. 제안한 Macro Core 기반의 지능형 아키텍처에 맞추어 시스템을 구성해 보고 이를 실제 얼굴 추출 및 인식 시스템 구성에 적용하고자 한다.

  • PDF

시스템 재설정 및 진화를 위한 지능형 아키텍처 개발 (Development of Reconfigurable and Evolvable Architecture for Intelligence Implement)

  • 나진희;안호석;박명수;최진영
    • 한국지능시스템학회논문지
    • /
    • 제15권7호
    • /
    • pp.823-827
    • /
    • 2005
  • 대부분의 지능 알고리즘들은 환경이나 사용목적에 따라 항상 최적 성능을 보장하지는 못한다. 그러므로 다양한 알고리즘들을 구현한 후에 환경이나 사용 목적에 따라 최적인 알고리즘 또는 알고리즘의 조합을 선택하여 시스템을 구성할 수 있다면 유용할 것이다. 본 논문에서는 지능형 Macro Core를 기반으로 한 시스템 재설정 및 진화를 위한 지능형 아키텍처를 제안한다. 제안한 아키텍처를 이용하면 새로운 알고리즘들의 추가와 이들을 조합하여 시스템을 구성하는 데에 드는 비용을 절약할 수 있으며, 표준화된 규격을 제시할 수 있다는 장점이 있다. 제안한 Macro Core 기반의 지능형 아키텍처에 맞추어 시스템을 구성해 보고 이를 실제 얼굴 추출 및 인식 시스템 구성에 적용하고자 한다.