• Title/Summary/Keyword: 하드웨어 최적화

Search Result 471, Processing Time 0.027 seconds

Performance Improvement of Cumulus Parameterization Code by Unicon Optimization Scheme (Unicon Optimization 기법을 이용한 적운모수화 코드 성능 향상)

  • Lee, Chang-Hyun;kim, Min-gyu;Shin, Dae-Yeong;Cho, Ye-Rin;Yeom, Gi-Hun;Chung, Sung-Wook
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.2
    • /
    • pp.124-133
    • /
    • 2022
  • With the development of hardware technology and the advancement of numerical model methods, more precise weather forecasts can be carried out. In this paper, we propose a Unicon Optimization scheme combining Loop Vectorization, Dependency Vectorization, and Code Modernization to optimize and increase Maintainability the Unicon source contained in SCAM, a simplified version of CESM, and present an overall SCAM structure. This paper tested the unicorn optimization scheme in the SCAM structure, and compared to the existing source code, the loop vectorization resulted in a performance improvement of 3.086% and the dependency vectorization of 0.4572%. And in the case of Unicorn Optimization, which applied all of these, the performance improvement was 3.457% compared to the existing source code. This proves that the Unicorn Optimization technique proposed in this paper provides excellent performance.

Hierarchical IoT Edge Resource Allocation and Management Techniques based on Synthetic Neural Networks in Distributed AIoT Environments (분산 AIoT 환경에서 합성곱신경망 기반 계층적 IoT Edge 자원 할당 및 관리 기법)

  • Yoon-Su Jeong
    • Advanced Industrial SCIence
    • /
    • v.2 no.3
    • /
    • pp.8-14
    • /
    • 2023
  • The majority of IoT devices already employ AIoT, however there are still numerous issues that need to be resolved before AI applications can be deployed. In order to more effectively distribute IoT edge resources, this paper propose a machine learning-based approach to managing IoT edge resources. The suggested method constantly improves the allocation of IoT resources by identifying IoT edge resource trends using machine learning. IoT resources that have been optimized make use of machine learning convolution to reliably sustain IoT edge resources that are always changing. By storing each machine learning-based IoT edge resource as a hash value alongside the resource of the previous pattern, the suggested approach effectively verifies the resource as an attack pattern in a distributed AIoT context. Experimental results evaluate energy efficiency in three different test scenarios to verify the integrity of IoT Edge resources to see if they work well in complex environments with heterogeneous computational hardware.

An Optimization Technique in Memory System Performance for RealTime Embedded Systems (실시간 임베디드 시스템을 위한 메모리 시스템 성능 최적화 기법)

  • Yongin Kwon;Doosan Cho;Jongwon Lee;Yongjoo Kim;Jonghee Youn;Sanghyun Park;Yunheung Paek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.882-884
    • /
    • 2008
  • 통상 하드웨어 캐시의 크기보다 수십에서 수백배 큰 크기의 데이타를 랜덤하게 접근하는 경우 낮은 메모리 접근 지역성(locality)에 기인하여 캐시 메모리 성능이 급격히 저하되는 문제를 야기한다. 예를 들면, 현재 보편적으로 사용되고 있는 차량용 General Positioning System (GPS) 프로그램의 경우 최대 32개의 위성으로부터 데이터를 받아 수신단의 위치를 계산하는 부분이 핵심 모듈중의 하나 이며, 이는 전체 성능의 50% 이상을 차지한다. 이러한 모듈에서는 위성 신호를 실시간으로 받아 버퍼 메모리에 저장하며, 이때 필요한 데이터가 순차적으로 저장되지 못하기 때문에 랜덤하게 데이터를 읽어 사용하게 된다. 결과적으로 낮은 지역성에 기인하여 실시간 (realtime)안에 데이터 처리를 하기 어려운 문제에 직면하게 된다. 통상의 통신 응용의 알고리즘 상에 내재된(inherited) 낮은 메모리 접근 지역성을 개선하는 것은 알고리즘 상에서의 접근을 요구한다. 이는 높은 비용이 필요함으로 본 연구에서는 사용되는 데이터 구조를 변환하여 지역성을 높이는 방향으로 접근하였다. 결과적으로 핵심 모듈에서 2배, 전체 시스템 성능에서 14%를 개선할 수 있었다.

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

  • Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
    • Geophysics and Geophysical Exploration
    • /
    • v.14 no.1
    • /
    • pp.98-104
    • /
    • 2011
  • Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.

The Implementation of Real-time Performance Monitor for Multi-thread Application (멀티스레드 어플리케이션을 위한 실시간 성능모니터의 구현)

  • Kim, Jin-Hyuk;Shin, Kwang-Sik;Yoon, Wan-Oh;Lee, Chang-Ho;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.82-90
    • /
    • 2011
  • Multi-core system is becoming more general with development of microprocessors. Due to this change in performance improvement paradigm, switching conventional single thread applications with multi thread applications. Performance monitoring tools are used to optimize application performance because of complexity in development of multi thread applications. Conventional performance monitoring tools are focused on performance itself rather than user friendliness or real-time support. Real-time performance monitor identify the problem while multi-threaded applications should be performed as well as check real-time operating status of the application. So it can be used as an effective tool compared to non-real-time performance monitor that only with simple performance indicators to find the cause of the problem. In this paper, we propose RMPM(Real-time Multi-core Performance Monitor) which is real-time performance monitoring tool for multi-core system. Observation period is optimized by comparing relation between overhead due to performance evaluation period and accuracy. Our performance monitor shows not only amount of CPU usage of whole system, memory usage, network usage but also aspect of overhead distribution per thread of an application.

Effects of Passivation Thin Films on the Optical Properties of the Green Organic Light Emitting Diodes (페시베이션 박막이 녹색 유기발광다이오드의 광학특성에 미치는 영향)

  • Mun, Sae Chan;Lee, Sang Hee;Park, Byung Min;Pyee, Jaeho;Chang, Ho Jung
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.23 no.1
    • /
    • pp.11-15
    • /
    • 2016
  • The organic light emitting diodes (OLEDs) have been studied as large flexible displays, light source and hard wares of internet of things. However, OLEDs show some drawbacks in terms of external environments due to the low work function of the metals and the reactive organic materials. In particular, the operation functions of the OLEDs tend to deteriorate rapidly by exposing the oxygen and moisture. So as to prevent it, domestic and overseas studies underway in various method such as ALD, PVD, CVD. But it has complex process and high cost. Therefore In order to protect devices from the external environments, it is important to develop the passivation thin films of low-cost and simple process which can prevent the devices from the penetration of the oxygen and moistures. In this study, to improve the reliability, passivation thin films were coated onto the green OLEDs by spin coating method and investigated the changes of the optical properties of the prepared devices at various doping concentrations of sodium alginate (SA). The passivation solutions were synthesized by using polyvinyl alcohol (PVA) host material with a dopant of SA which were added with the amounts of 10, 20 and 40 wt% into the PVA. As a result, the best barrier properties of the OLEDs were obtained for the samples with 40 wt% SA. Finally, the passivation films can be optimized by using the mixture solution of PVA and SA materials.

A Design of PRESENT Crypto-Processor Supporting ECB/CBC/OFB/CTR Modes of Operation and Key Lengths of 80/128-bit (ECB/CBC/OFB/CTR 운영모드와 80/128-비트 키 길이를 지원하는 PRESENT 암호 프로세서 설계)

  • Kim, Ki-Bbeum;Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1163-1170
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT which was specified as a standard for lightweight cryptography ISO/IEC 29192-2 is described. The PRESENT crypto-processor supports two key lengths of 80 and 128 bits, as well as four modes of operation including ECB, CBC, OFB, and CTR. The PRESENT crypto-processor has on-the-fly key scheduler with master key register, and it can process consecutive blocks of plaintext/ciphertext without reloading master key. In order to achieve a lightweight implementation, the key scheduler was optimized to share circuits for key lengths of 80 bits and 128 bits. The round block was designed with a data-path of 64 bits, so that one round transformation for encryption/decryption is processed in a clock cycle. The PRESENT crypto-processor was verified using Virtex5 FPGA device. The crypto-processor that was synthesized using a $0.18{\mu}m$ CMOS cell library has 8,100 gate equivalents(GE), and the estimated throughput is about 908 Mbps with a maximum operating clock frequency of 454 MHz.

Maximum Power Point Tracking operation of Thermoelectric Module without Current Sensor (전류센서가 없는 열전모듈의 최대전력점 추적방식)

  • Kim, Tae-Kyung;Park, Dae-Su;Oh, Sung-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.9
    • /
    • pp.436-443
    • /
    • 2017
  • Recently, the development of new energy technologies has become a hot topic due to problems,such as global warming. Unlike renewable energy technologies, such as solar energy generation, solar power, and wind power, which are optimized to achieve medium or above output power, the output power of energy harvesting technology is very small and has not received much attention. On the other hand, as the mobile industry has been revitalized recently, the utility of energy harvesting technology has been reevaluated. In addition, the technology of tracking the maximum power point has been actively researched. This paper proposes a new MPPT(Maximum Power Point Tracking) control method for a TEM(thermoelectric module) for load resistance. The V-I curve characteristics and internal resistance of TEM were analyzed and the conventional MPPT control methods were compared. The P&O(Perturbation and Observation) control method is more accurate, but it is less economical than the CV (Constant Voltage)control method because it usestwo sensors to measure the voltage and current source. The CV control method is superior to the P&O control method in economic aspects because it uses only one voltage sensor but the MPP is not matched precisely. In this paper, a method wasdesigned to track the MPP of TEM combining the advantages of the two control method. The proposed MPPT control method wasverified by PSIM simulation and H/W implementation.

SLEDS:A System-Level Event-Driven Simulator for Asynchronous Microprocessors (SLEDS:비동기 마이크로프로세서를 위한 상위 수준 사건구동식 시뮬레이터)

  • Choi, Sang-Ik;Lee, Jeong-Gun;Kim, Eui-Seok;Lee, Dong-Ik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.1
    • /
    • pp.42-56
    • /
    • 2002
  • It is possible but not efficient to model and simulate asynchronous microprocessors with the existing HDLs(HARDware Description Languages) such as VHDL or Verilog. The reason it that the description becomes too complex. and also the simulation time becomes too long to explore the design space. Therefore it is necessary to establish a methodology and develop a tool for modeling the handshake protocol of asynchronous microprocessors very easily and simulating it very fast. Under this objective an efficient CAD(Computer Aided Design) tool SLEDS(System Level Event-Driven Simulator) was developed which can evaluate performance of a processor through modeling with a simple description an simulating with event driven engine in the system level. The ultimate goal in the tool SLEDS is to fin the optimal conditions for a system to produce high performance by balancing the delay of each module in the system. Besides SLEDS aims at verifying the design through comparing the expected results with the actual ones by performing the defined behavior.

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.