• Title/Summary/Keyword: Policy Optimization

Search Result 307, Processing Time 0.025 seconds

MAPPO based Hyperparameter Optimization for CNN (MAPPO 기반 CNN 하이퍼 파라미터 최적화)

  • Ma, Zhixin;Joe, Inwhee
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.446-447
    • /
    • 2022
  • 대부분의 머신러닝 및 딥러닝 모델의 경우 하이퍼 파라미터 선택은 모델의 성능에 큰 영향을 미친다. 따라서 전문가들은 작업을 수행하기 위해 모델을 구축할 때 하이퍼 파라미터 튜닝을 수행하는 데 상당한 시간을 소비해야 한다. Hyperparameter Optimization(HPO)을 해결하기 위한 알고리즘은 많지만 대부분의 방법은 검색을 수행하기 위해 각 epoch에서 실제 실험 결과를 필요로 한다. 따라서 HPO 검색을 위한 시간과 계산 지원을 줄이기 위해 본 논문에서는 Multi-agent Proximal Policy Optimization(MAPPO) 강화 학습 알고리즘을 제안한다. 2개의 이미지 분류 데이터 세트에 대한 실험 결과는 우리의 모델이 속도와 정확성에서 다른 기존 방법보다 우수하다는 것을 보여준다.

Developing Novel Algorithms to Reduce the Data Requirements of the Capture Matrix for a Wind Turbine Certification (풍력 발전기 평가를 위한 수집 행렬 데이터 절감 알고리즘 개발)

  • Lee, Jehyun;Choi, Jungchul
    • New & Renewable Energy
    • /
    • v.16 no.1
    • /
    • pp.15-24
    • /
    • 2020
  • For mechanical load testing of wind turbines, capture matrix is constructed for various range of wind speeds according to the international standard IEC 61400-13. The conventional method wastes considerable amount of data by its invalid data policy -segment data into 10 minutes then remove invalid ones. Previously, we have suggested an alternative way to save the total amount of data to build a capture matrix, but the efficient selection of data has been still under question. The paper introduces optimization algorithms to construct capture matrix with less data. Heuristic algorithm (simple stacking and lowest frequency first), population method (particle swarm optimization) and Q-Learning accompanied with epsilon-greedy exploration are compared. All algorithms show better performance than the conventional way, where the distribution of enhancement was quite diverse. Among the algorithms, the best performance was achieved by heuristic method (lowest frequency first), and similarly by particle swarm optimization: Approximately 28% of data reduction in average and more than 40% in maximum. On the other hand, unexpectedly, the worst performance was achieved by Q-Learning, which was a promising candidate at the beginning. This study is helpful for not only wind turbine evaluation particularly the viewpoint of cost, but also understanding nature of wind speed data.

A Dynamic Price Formation System and Its Welfare Analysis in Quantity Space: An Application to Korean Fish Markets

  • Park, Hoan-Jae
    • The Journal of Fisheries Business Administration
    • /
    • v.41 no.2
    • /
    • pp.107-133
    • /
    • 2010
  • As policy makers are often concerned about dynamic effects of demand behavior and its welfare analysis by quantity changes, the paper shows how dynamic price formation systems can be built up to analyze the effect of policy options to the markets dynamically. The paper develops dynamic model of price formation for fish from the intertemporal optimization of the consumer choice problem. While the resulting model has a similar form of the error correction types of dynamic price formation system, it provides the rational demand behavior contrary to the myopic behavior of error correction demand models. The paper also develops appropriate tools of dynamic welfare analysis in quantity space using only short-run demand estimates both theoretically and empirically as a first attempt in the literature of price formation and fisheries. The empirical results of Korean fish markets show that the dynamic model and the welfare measures are reasonably plausible. The methodology and theory of this research can be applied and extended to the commodity aggregation, dynamic demand estimation, and dynamic welfare effects of regulation in the similar framework. Thus, it is hoped that this will enhance its applications to the demand-side economics.

Scheduling of Wafer Burn-In Test Process Using Simulation and Reinforcement Learning (강화학습과 시뮬레이션을 활용한 Wafer Burn-in Test 공정 스케줄링)

  • Soon-Woo Kwon;Won-Jun Oh;Seong-Hyeok Ahn;Hyun-Seo Lee;Hoyeoul Lee; In-Beom Park
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.2
    • /
    • pp.107-113
    • /
    • 2024
  • Scheduling of semiconductor test facilities has been crucial since effective scheduling contributes to the profits of semiconductor enterprises and enhances the quality of semiconductor products. This study aims to solve the scheduling problems for the wafer burn-in test facilities of the semiconductor back-end process by utilizing simulation and deep reinforcement learning-based methods. To solve the scheduling problem considered in this study. we propose novel state, action, and reward designs based on the Markov decision process. Furthermore, a neural network is trained by employing the recent RL-based method, named proximal policy optimization. Experimental results showed that the proposed method outperformed traditional heuristic-based scheduling techniques, achieving a higher due date compliance rate of jobs in terms of total job completion time.

  • PDF

Active control of flow around a 2D square cylinder using plasma actuators (2차원 사각주 주위 유동의 플라즈마 능동제어에 대한 연구)

  • Paraskovia Kolesova;Mustafa G. Yousif;Hee-Chang Lim
    • Journal of the Korean Society of Visualization
    • /
    • v.22 no.2
    • /
    • pp.44-54
    • /
    • 2024
  • This study investigates the effectiveness of using a plasma actuator for active control of turbulent flow around a finite square cylinder. The primary objective is to analyze the impact of plasma actuators on flow separation and wake region characteristics, which are critical for reducing drag and suppressing vortex-induced vibrations. Direct Numerical Simulation (DNS) was employed to explore the flow dynamics at various operational parameters, including different actuation frequencies and voltages. The proposed methodology employs a neural network trained using the Proximal Policy Optimization (PPO) algorithm to determine optimal control policies for plasma actuators. This network is integrated with a computational fluid dynamics (CFD) solver for real-time control. Results indicate that this deep reinforcement learning (DRL)-based strategy outperforms existing methods in controlling flow, demonstrating robustness and adaptability across various flow conditions, which highlights its potential for practical applications.

A Generalized N-Policy for an M/M/1 Queueing System and Its Optimization

  • Bae, Jong-Ho;Kim, Jong-Woo;Lee, Eui-Yong
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.61-66
    • /
    • 2002
  • We consider a generalized N-policy for an M/M/1 queueing system. The idle server starts to work with ordinary service rate when a customer arrives. If the number of customers in the system reaches N, the service rate gets faster and continues until the system becomes empty. Otherwise, the server finishes the busy period with ordinary service rate. We obtain the limiting distribution of the number of customers in the system. After assigning various operating costs to the system, we show that there exists a unique fast service rate minimizing the long-run average cost per unit time.

  • PDF

Optimal Sizing of In-Plant and Leased Storage Spaces under a Randomized Storage Policy (임의 저장방식 하에서 기업 내 저장공간과 외부의 임차공간에 대한 최적 규모 결정)

  • Lee, Moon-Kyu
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.30 no.4
    • /
    • pp.294-300
    • /
    • 2004
  • This paper considers a trade-off effect between in-house storage space and leased storage space in generic warehouses operated under a randomized storage assignment policy. The amount of in-house storage space is determined based on the law of large numbers satisfying a given service level of protection against space shortages. Excess space requirement is assumed to be met via leased storage space. A new analytic model is formulated for determining the excess space such that the total cost of storage space is minimized. Finally, computational results are provided for the systems where the standard economic-order-quantity inventory model is used for all items.

Development of an Annual Expenditure Assessment Model for Amenity-oriented Policy-making in Rural Areas (어메니티 지향적 지방행정을 위한 정책평가모델의 개발)

  • Jung, Nam-Su;Lee, Ji-Min;Lee, Jeong-Jae
    • Journal of Korean Society of Rural Planning
    • /
    • v.10 no.2 s.23
    • /
    • pp.43-49
    • /
    • 2004
  • According to the growing concerns of the public with efficiency and effects of regional policies, their assessment works have become an important issue. Up to now, several studies have been carried out on economic effects of policies using conventional cost/benefit analysis, while there have been few studies on assessment of amenity oriented policies. From the above consideration, this study tried to develop An Annual Expenditure Assessment Model (AEAM) for amenity-oriented policy-making in rural area. As a pre-work for model development, the hierarchical indices system for rural development and the classification system of expenditure were designed. Being based on high significant relationship between rural amenities and local government expenditure, a linear optimization model for maximization of regional amenity was constructed. Through a case study of Sunchang-gun, Chonbuk-province, the model applicability was ascertained.

Differential Burn-in and Reliability Screening Policy Using Yield Information Based on Spatial Stochastic Processes (공간적 확률 과정 기반의 수율 정보를 이용한 번인과 신뢰성 검사 정책)

  • Hwang, Jung Yoon;Shim, Younghak
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.4
    • /
    • pp.1-9
    • /
    • 2012
  • Decisions on reliability screening rules and burn-in policies are determined based on the estimated reliability. The variability in a semiconductor manufacturing process does not only causes quality problems but it also makes reliability estimation more complicated. This study investigates the nonuniformity characteristics of integrated circuit reliability according to defect density distribution within a wafer and between wafers then develops optimal burn-in policy based on the estimated reliability. New reliability estimation model based on yield information is developed using a spatial stochastic process. Spatial defect density variation is reflected in the reliability estimation, and the defect densities of each die location are considered as input variables of the burn-in optimization. Reliability screening and optimal burn-in policy subject to the burn-in cost minimization is examined, and numerical experiments are conducted.

A Design Problem of a Service System with Bi-functional Servers (이중작업능력의 서버로 구성된 서비스시스템 설계)

  • Kim, Sung-Chul
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.32 no.3
    • /
    • pp.17-31
    • /
    • 2007
  • In this paper, we consider a service system with bi-functional servers, which can switch between the primary service room and the secondary room. A service policy is characterized by the switching paints which depend on the queue length in the primary service room and the service level requirement constraint of the secondary room. The primary service room is modeled as a Markovian queueing system and the throughput of the primary service room is function of the total number of bi-functional servers. the buffer capacity of the primary service room, and the service policy. There is a revenue obtained from throughput and costs due to servers and buffers. We study the problem of simuitaneously determining the optimal number of servers, buffer capacity, and service policy to maximize profit of the service system, and develop an algorithm which can be successfully applied with the small number of computations.