• Title/Summary/Keyword: 칩 멀티 프로세서

Search Result 61, Processing Time 0.023 seconds

A Research about Open Source Distributed Computing System for Realtime CFD Modeling (SU2 with OpenCL and MPI) (실시간 CFD 모델링을 위한 오픈소스 분산 컴퓨팅 기술 연구)

  • Lee, Jun-Yeob;Oh, Jong-woo;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.171-171
    • /
    • 2017
  • 전산유체역학(CFD: Computational Fluid Dynamics)를 이용한 스마트팜 환경 내부의 정밀 제어 연구가 진행 중이다. 시계열 데이터의 난해한 동적 해석을 극복하기위해, 비선형 모델링 기법의 일종인 인공신경망을 이용하는 방안을 고려하였다. 선행 연구를 통하여 환경 데이터의 비선형 모델링을 위한 Tensorflow활용 방법이 하드웨어 가속 기능을 바탕으로 월등한 성능을 보임을 확인하였다. 그럼에도 오프라인 일괄(Offline batch)처리 방식의 한계가 있는 인공신경망 모델링 기법과 현장 보급이 불가능한 고성능 하드웨어 연산 장치에 대한 대안 마련이 필요하다고 판단되었다. CFD 해석을 위한 Solver로 SU2(http://su2.stanford.edu)를 이용하였다. 운영 체제 및 컴파일러는 1) Mac OS X Sierra 10.12.2 Apple LLVM version 8.0.0 (clang-800.0.38), 2) Windows 10 x64: Intel C++ Compiler version 16.0, update 2, 3) Linux (Ubuntu 16.04 x64): g++ 5.4.0, 4) Clustered Linux (Ubuntu 16.04 x32): MPICC 3.3.a2를 선정하였다. 4번째 개발환경인 병렬 시스템의 경우 하드웨어 가속는 OpenCL(https://www.khronos.org/opencl/) 엔진을 이용하고 저전력 ARM 프로세서의 일종인 옥타코어 Samsung Exynos5422 칩을 장착한 ODROID-XU4(Hardkernel, AnYang, Korea) SBC(Single Board Computer)를 32식 병렬 구성하였다. 분산 컴퓨팅을 위한 환경은 Gbit 로컬 네트워크 기반 NFS(Network File System)과 MPICH(http://www.mpich.org/)로 구성하였다. 공간 분해능을 계측 주기보다 작게 분할할 경우 발생하는 미지의 바운더리 정보를 정의하기 위하여 3차원 Kriging Spatial Interpolation Method를 실험적으로 적용하였다. 한편 병렬 시스템 구성이 불가능한 1,2,3번 환경의 경우 내부적으로 이미 존재하는 멀티코어를 활용하고자 OpenMP(http://www.openmp.org/) 라이브러리를 활용하였다. 64비트 병렬 8코어로 동작하는 1,2,3번 운영환경의 경우 32비트 병렬 128코어로 동작하는 환경에 비하여 근소하게 2배 내외로 연산 속도가 빨랐다. 실시간 CFD 수행을 위한 분산 컴퓨팅 기술이 프로세서의 속도 및 운영체제의 정보 분배 능력에 따라 결정된다고 판단할 수 있었다. 이를 검증하기 위하여 4번 개발환경에서 운영체제를 64비트로 개선하여 5번째 환경을 구성하여 검증하였다. 상반되는 결과로 64비트 72코어로 동작하는 분산 컴퓨팅 환경에서 단일 프로세서 기반 멀티 코어(1,2,3번) 환경보다 보다 2.5배 내외 연산속도 향상이 있었다. ARM 프로세서용 64비트 운영체제의 완성도가 낮은 시점에서 추후 성공적인 실시간 CFD 모델링을 위한 지속적인 검토가 필요하다.

  • PDF

Analysis on the Cooling Efficiency of High-Performance Multicore Processors according to Cooling Methods (기계식 쿨링 기법에 따른 고성능 멀티코어 프로세서의 냉각 효율성 분석)

  • Kang, Seung-Gu;Choi, Hong-Jun;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.7
    • /
    • pp.1-11
    • /
    • 2011
  • Many researchers have studied on the methods to improve the processor performance. However, high integrated semiconductor technology for improving the processor performance causes many problems such as battery life, high power density, hotspot, etc. Especially, as hotspot has critical impact on the reliability of chip, thermal problems should be considered together with performance and power consumption when designing high-performance processors. To alleviate the thermal problems of processors, there have been various researches. In the past, mechanical cooling methods have been used to control the temperature of processors. However, up-to-date microprocessors causes severe thermal problems, resulting in increased cooling cost. Therefore, recent studies have focused on architecture-level thermal-aware design techniques than mechanical cooling methods. Even though architecture-level thermal-aware design techniques are efficient for reducing the temperature of processors, they cause performance degradation inevitably. Therefore, if the mechanical cooling methods can manage the thermal problems of processors efficiently, the performance can be improved by reducing the performance degradation due to architecture-level thermal-aware design techniques such as dynamic thermal management. In this paper, we analyze the cooling efficiency of high-performance multicore processors according to mechanical cooling methods. According to our experiments using air cooler and liquid cooler, the liquid cooler consumes more power than the air cooler whereas it reduces the temperature more efficiently. Especially, the cost for reducing $1^{\circ}C$ is varied by the environments. Therefore, if the mechanical cooling methods can be used appropriately, the temperature of high-performance processors can be managed more efficiently.

Application-specific Traffic Generator (응용 프로그램의 특성 반영이 가능한 트래픽 생성기)

  • Yeo, Phil-Koo;Cho, Keol;Yu, Dae-Chul;Hwang, Young-Si;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.9
    • /
    • pp.40-49
    • /
    • 2011
  • Integrating massive components and low-power policies have been actively investigated for system-on-chip designs. But in recent years, finding the optimal interconnection structure among heterogeneous components has emerged as a critical system design issue. Therefore, various simulation tools to model interconnection designs are being developed and performance evaluation of simulation is reflected in the real design. But most of the simulation environments employ traffic generation based on the mathematical probability functions, and such traffic generation cannot fully cover for various situations that may be occurred in the real system. Therefore, the demand for traffic pattern generation based on real applications is increasing. However, there have been few simulators that adopt application-specific traffic generators. This paper proposes a novel traffic generation method in simulating various interconnection structures for multi-processor system-on-chip design. The proposed traffic generation method can generate traffic patterns that can reflect the actual characteristics of the application and evaluate the performance of an interconnection structure under more realistic circumstance than traffic patterns using mathematical probability functions. By comparing the differences between the proposed method and the one based on mathematical probability functions, this paper shows advantages of the proposed traffic generation method.

Collaborative Streamlined On-Chip Software Architecture on Heterogenous Multi-Cores for Low-Power Reactive Control in Automotive Embedded Processors (차량용 임베디드 프로세서에서 저전력 반응적 제어를 위한 이기종 멀티코어 협력적 스트리밍 온-칩 소프트웨어 구조)

  • Jisu, Kwon;Daejin, Park
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.375-382
    • /
    • 2022
  • This paper proposes a multi-core cooperative computing structure considering the heterogeneous features of automotive embedded on-chip software. The automotive embedded software has the heterogeneous execution flow properties for various hardware drives. Software developed with a homogeneous execution flow without considering these properties will incur inefficient overhead due to core latency and load. The proposed method was evaluated on an target board on which a automotive MCU (micro-controller unit) with built-in multi-cores was mounted. We demonstrate an overhead reduction when software including common embedded system tasks, such as ADC sampling, DSP operations, and communication interfaces, are implemented in a heterogeneous execution flow. When we used the proposed method, embedded software was able to take advantage of idle states that occur between heterogeneous tasks to make efficient use of the resources on the board. As a result of the experiments, the power consumption of the board decreased by 42.11% compared to the baseline. Furthermore, the time required to process the same amount of sampling data was reduced by 27.09%. Experimental results validate the efficiency of the proposed multi-core cooperative heterogeneous embedded software execution technique.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.

MPSoC Design Space Exploration Based on Static Analysis of Process Network Model (프로세스 네트워크 모델의 정적 분석에 기반을 둔 다중 프로세서 시스템 온 칩 설계 공간 탐색)

  • Ahn, Yong-Jin;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.7-16
    • /
    • 2007
  • In this paper, we introduce a new design environment for efficient multiprocessor system-on-chip design space exploration. The design environment takes a process network model as input system specification. The process network model has been widely used for modeling signal processing applications because of its excellent modeling power. However, it has limitation in predictability, which could cause severe problem for real time systems. This paper proposes a new approach that enables static analysis of a process network model by converting it to a hierarchical synchronous dataflow model. For efficient design space exploration in the early design step, mapping application to target architectures has been a crucial part for finding better solution. In this paper, we propose an efficient mapping algorithm. Our mapping algorithm supports both single bus architecture and multiple bus architecture. In the experiments, we show that the automatic conversion approach of the process network model for static analysis is performed successfully for several signal processing applications, and show the effectiveness of our mapping algorithm by comparing it with previous approaches.

Implementation of Location Based Services Using Satellite DMB System (위성 DMB 시스템을 이용한 위치 기반 서비스 구현)

  • Kwon, Seong-Geun;Lee, Suk-Hwan;Kim, Kang-Wook;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.1
    • /
    • pp.32-39
    • /
    • 2012
  • In this paper, the implementation of location based services (LBS) using S-DMB (satellite-digital multimedia broadcasting) system was proposed. In S-DMB System, the frequency of transmitted signal is about 2 GHz which has a characteristics of strong straightness but weak diffraction so that there are many shade areas such as indoors and underground spaces. Therefore the signal transmitted from the satellite should be retransmitted by the earth repeaters called as gap filler. Because each gap filler has its own identification value, the gap filler ID introduces the area in which the gap filler was installed. Generally, the 51st data symbols of S-DMB pilot signal transmitted from the satellite are padded by dummy value and gap filler ID is embedded in this pilot symbol by the gap filler when S-DMB signals are retransmitted by gap fillers. So using gap filler ID of S-DMB system, LBS such as region registration, distance and time to destination, alarm of local area information could be implemented. In the experiment to prove the performance of the proposed LBS system using the gap filler ID of the S-DMB system, the firmware of S-DMB chip composing of RF and baseband parts was lightly modified so that application processor was able to manipulate the gap filler ID and the its related regional information.

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.

A VLSI Architecture for the Real-Time 2-D Digital Signal Processing (실시간 2차원 디지털 신호처리를 위한 VLSI 구조)

  • 권희훈
    • Information and Communications Magazine
    • /
    • v.9 no.9
    • /
    • pp.72-85
    • /
    • 1992
  • The throughput requirement for many digital signal processing is such that multiple processing units are essential for real-time implementation. Advances in VLSI technology make it feasible to design and implement computer systems consisting of a large number of function units. The research on a very high throughput VLSI architecture for digital signal processing applications requires the development of an algorithm, decomposition scheme which can minimize data communication requirements as well as minimize computational complexity. The objectives of the research are to investigate computationally efficient algorithms for solution of the class of problems which can be modeled as DLSI systems or adaptive system, and develop VLSI architectures and associated multiprocessor systems which can be used to implement these algorithms in real-time. A new VLSI architecture for real-time 2-D digital signal processing applications is proposed in this research. This VLSI architecture extends the concept of having a single processing units in a chip. Because this VLSI architecture has the advantage that the complexity and the number of computations per input does not increase as the size of the input data in increased, it can process very large 2-D date in near real-time.

  • PDF

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).