• Title/Summary/Keyword: MPI Unit

Search Result 17, Processing Time 0.023 seconds

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

Design 5Q MPI Hardware Unit Supporting Standard Mode (표준 모드를 지원하는 5Q MPI 하드웨어 유닛 설계)

  • Park, Jae-Won;Chung, Won-Young;Lee, Seung-Woo;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.1B
    • /
    • pp.59-66
    • /
    • 2012
  • The use of MPSoC has been increasing because of a rise of use of mobile devices and complex applications. For improving the performance of MPSoC, number of processor has been increasing. Standard MPI is used for efficiently sending data in distributed memory architecture that has advantage in multi processor. Standard In this paper, we propose a scalable distributed memory system with a low cost hardware message passing interface(MPI). The proposed architecture improves transfer rate with buffered send for small size packet. Three queues, Ready Queue, Request Queue, and Reservation Queue, work as previous architecture, and two queues, Small Ready Queue and Small Request Queue, are added to send small size packet. When the critical point is set 8 bytes, the proposed architecture takes more than 2 times the performance improvement in the data that below the critical point.

The Design of MPI Hardware Unit for Enhanced Broadcast Communication (효율적인 브로드캐스트 통신을 지원하는 MPI 하드웨어 유닛 설계)

  • Yun, Hee-Jun;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.11B
    • /
    • pp.1329-1338
    • /
    • 2011
  • This paper proposes an algorithm and hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional systems, collective communication is converted into point-to-point communications by MPI library cell without considering the state of communication port of each processing node which represents the processing node is in busy state or free state. If conflicting point-to-point communication occurs during broadcast communication, the transmitting speed for broadcast communication is decreased. Thus, this paper proposed an algorithm which determines the order of point-to-point communications for broadcast communication according to the state of each processing node. According to the state of each processing node, the proposed algorithm decreases total broadcast communication time by transmitting message preferentially to the processing node with communication port in free state. The proposed MPI unit for broadcast communication is evaluated by modeling it with systemC. In addition, it achieved a highly improved performance for broadcast communication up to 78% with 16 nodes. This result shows the proposed algorithm is useful to improving total performance of MPSoC.

Design and Computer Control of a Sliding Mode Fuel-Injection Controller for MPI Gasoline Engines (MPI 가솔린 엔진용 슬라이딩 모드 연료분사 제어기 설계 및 컴퓨터 제어)

  • 김종식;고용서;강건용;황이철
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.15 no.3
    • /
    • pp.1030-1043
    • /
    • 1991
  • 본 연구에서는 모델링오차나 외란 등의 불확실성에도 강인한 슬라이딩 모드 제어방법을 이용하여 새로운 연료분사 제어기를 설계하였다. 그리고 8253 타이머와 A/D 변환기, 인터페이스회로 등으로 MPI가솔린 엔진용 전자 제어장치를 실제 엔진에 적용시킴으로써 새로이 설계된 연료분사 제어시스템의 성능을 파악하였다.엔진의 운전상태를 여러가지 제어 모드로 분류할 수 있으나 엔진회전수가 2000rpm, 부하가 20N의 일정한 부하 조건에서 엔진회전수를 1500rpm에서 2000rpm으로 변화시켰을 때의 과도상태 응답을 파악하였다. 이와 같이 새로운 슬라이딩 모드 연료분사 제어시스템 을 개발하여 3원촉매 변환기의 변환효율을 극대화함으로써 배기가스의 유해물질을 최 소화하는 것을 본 연구의 목적으로 하였다.

Processing-Node Status-based Message Scattering and Gathering for Multi-processor Systems on Chip

  • Park, Jongsu
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.4
    • /
    • pp.279-284
    • /
    • 2019
  • This paper presents processing-node status-based message scattering and gathering algorithms for multi-processor systems on chip to reduce the communication time between processors. In the message-scattering part of the message-passing interface (MPI) scatter function, data transmissions are ordered according to the proposed linear algorithm, based on the processor status. The MPI hardware unit in the root processing node checks whether each processing node's status is 'free' or 'busy' when an MPI scatter message is received. Then, it first transfers the data to a 'free' processing node, thereby reducing the scattering completion time. In the message-gathering part of the MPI gather function, the data transmissions are ordered according to the proposed linear algorithm, and the gathering is performed. The root node receives data from the processing node that wants to transfer first, and reduces the completion time during the gathering. The experimental results show that the performance of the proposed algorithm increases at a greater rate as the number of processing nodes increases.

Conjugated Oligomers Combining Fluorene and Thiophene Units : Towards Supramolecular Electronics

  • Leclere, Ph.;Surin, M.;Sonar, P.;Grimsdale, A.C.;Mllen, K.;Cavallini, M.;Biscarini, F.;Lazzaroni, R.
    • Proceedings of the Polymer Society of Korea Conference
    • /
    • 2006.10a
    • /
    • pp.228-228
    • /
    • 2006
  • Conjugated oligomers, used as models for fluorene-thiophene copolymers, are compared in terms of the microscopic morphology of thin deposits and the optical properties. The AFM images and the solid-state absorption and emission spectra are interpreted in line with the structural data, in terms of the assembly of the conjugated molecules. The compound with a terthiophene central unit and fluorene end-groups shows well-defined monolayer-by-monolayer assembly into micrometer-long strip-like structures, with a crystalline herringbone-type organization within the monolayers. Polarized confocal microscopy indicates a strong orientation of the crystalline domains within the stripes. In contrast, the compound with a terfluorene central unit and thiophene end groups forms no textured aggregates. The difference in behavior between the two compounds most probably originates from their different capability of forming densely-packed assemblies of ${\pi-pi}$ interacting molecules. These assemblies are used as active elements in organic field effect transistors designed by using soft lithography technique.

  • PDF

Analyzing Regional Public Hospitals' Efficiency and Productivity Change (지방의료원의 효율성 및 생산성변화 분석)

  • Jeon, Jin-hwan;Kim, Jong-Ki
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.303-313
    • /
    • 2010
  • The purpose of this study is to evaluate the performance efficiency and productivity change of the regional public hospital in Korea. We use DEA(Data Envelopment Analysis) for CCR, BCC model, and MPI(Malmquist Productivity Index). DEA is a useful nonparametric technique for measurement of efficiency of a DMU(Decision Making Unit) and MPI is a evaluation method to measure DMU's productivity change. We utilize 34 regional public hospital's time-series data over 6 years from 2003 to 2008.The results of this study were as follows. First, technical efficiency(TE) shows that approximately 3.6% of inefficiency exists on the regional public hospitals and it reveals that the cause for technical inefficiency is due to scale inefficiency. Second, MPI's results show that regional public hospital made effort to improve total factor productivity change to raise technical efficiency. In order to raise efficiency, the regional public hospitals should deploy internal innovation and the government should support welfare policies.

A PRICING METHOD OF HYBRID DLS WITH GPGPU

  • YOON, YEOCHANG;KIM, YONSIK;BAE, HYEONG-OHK
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.20 no.4
    • /
    • pp.277-293
    • /
    • 2016
  • We develop an efficient numerical method for pricing the Derivative Linked Securities (DLS). The payoff structure of the hybrid DLS consists with a standard 2-Star step-down type ELS and the range accrual product which depends on the number of days in the coupon period that the index stay within the pre-determined range. We assume that the 2-dimensional Geometric Brownian Motion (GBM) as the model of two equities and a no-arbitrage interest model (One-factor Hull and White interest rate model) as a model for the interest rate. In this study, we employ the Monte Carlo simulation method with the Compute Unified Device Architecture (CUDA) parallel computing as the General Purpose computing on Graphic Processing Unit (GPGPU) technology for fast and efficient numerical valuation of DLS. Comparing the Monte Carlo method with single CPU computation or MPI implementation, the result of Monte Carlo simulation with CUDA parallel computing produces higher performance.

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.

Acceleration of Anisotropic Elastic Reverse-time Migration with GPUs (GPU를 이용한 이방성 탄성 거꿀 참반사 보정의 계산가속)

  • Choi, Hyungwook;Seol, Soon Jee;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.18 no.2
    • /
    • pp.74-84
    • /
    • 2015
  • To yield physically meaningful images through elastic reverse-time migration, the wavefield separation which extracts P- and S-waves from reconstructed vector wavefields by using elastic wave equation is prerequisite. For expanding the application of the elastic reverse-time migration to anisotropic media, not only the anisotropic modelling algorithm but also the anisotropic wavefield separation is essential. The anisotropic wavefield separation which uses pseudo-derivative filters determined according to vertical velocities and anisotropic parameters of elastic media differs from the Helmholtz decomposition which is conventionally used for the isotropic wavefield separation. Since applying these pseudo-derivative filter consumes high computational costs, we have developed the efficient anisotropic wavefield separation algorithm which has capability of parallel computing by using GPUs (Graphic Processing Units). In addition, the highly efficient anisotropic elastic reverse-time migration algorithm using MPI (Message-Passing Interface) and incorporating the developed anisotropic wavefield separation algorithm with GPUs has been developed. To verify the efficiency and the validity of the developed anisotropic elastic reverse-time migration algorithm, a VTI elastic model based on Marmousi-II was built. A synthetic multicomponent seismic data set was created using this VTI elastic model. The computational speed of migration was dramatically enhanced by using GPUs and MPI and the accuracy of image was also improved because of the adoption of the anisotropic wavefield separation.