• 제목/요약/키워드: Policy Optimization

검색결과 307건 처리시간 0.023초

열과 전기 제약을 고려한 최적화 CHP 운전 (OPTIMIZATION OF CHP OPERATION WITH HEAT AND ELECTRICITY CONSTRAINTS)

  • 밍뉴엔;최낙현;아지자;윤용태
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2008년도 추계학술대회 논문집 전력기술부문
    • /
    • pp.457-459
    • /
    • 2008
  • This paper presents the optimization of CHP (Combined heat and power) plant under deregulated market. In this case, a boiler is added as different source for heat providing, that gives flexible and efficient operation for the plant. The purpose of optimization is to maximize the profit in period of 24 hours by making unit commitment decision, called "optimal policy". In this paper, Dynamic Programming method is introduced as the effective and efficient method. Finally, an example is solved to illustrate the optimal Policy of such a CHP and boiler.

  • PDF

수명교체와 예비품 재고 정책의 통합 최적화 (Joint Optimization of Age Replacement and Spare Provisioning Policy)

  • 임성욱;박영택
    • 품질경영학회지
    • /
    • 제40권1호
    • /
    • pp.88-91
    • /
    • 2012
  • Joint optimization of preventive age replacement and inventory policy is considered in this paper. There are three decision variables in the problem: (i) preventive replacement age of the operating unit, (ii) order quantity per order and (iii) reorder point for spare replenishment. Preventive replacement age and order quantity are jointly determined so as to minimize the expected cost rate, and then the reorder point for meeting a desired service level is found. A numerical example is included to explain the joint optimization model.

Pareto-Based Multi-Objective Optimization for Two-Block Class-Based Storage Warehouse Design

  • Sooksaksun, Natanaree
    • Industrial Engineering and Management Systems
    • /
    • 제11권4호
    • /
    • pp.331-338
    • /
    • 2012
  • This research proposes a Pareto-based multi-objective optimization approach to class-based storage warehouse design, considering a two-block warehouse that operates under the class-based storage policy in a low-level, picker-to-part and narrow aisle warehousing system. A mathematical model is formulated to determine the number of aisles, the length of aisle and the partial length of each pick aisle to allocate to each product class that minimizes the travel distance and maximizes the usable storage space. A solution approach based on multiple objective particle swarm optimization is proposed to find the Pareto front of the problems. Numerical examples are given to show how to apply the proposed algorithm. The results from the examples show that the proposed algorithm can provide design alternatives to conflicting warehouse design decisions.

근위 정책 최적화를 활용한 자산 배분에 관한 연구 (A Study on Asset Allocation Using Proximal Policy Optimization)

  • 이우식
    • 한국산업융합학회 논문집
    • /
    • 제25권4_2호
    • /
    • pp.645-653
    • /
    • 2022
  • Recently, deep reinforcement learning has been applied to a variety of industries, such as games, robotics, autonomous vehicles, and data cooling systems. An algorithm called reinforcement learning allows for automated asset allocation without the requirement for ongoing monitoring. It is free to choose its own policies. The purpose of this paper is to carry out an empirical analysis of the performance of asset allocation strategies. Among the strategies considered were the conventional Mean- Variance Optimization (MVO) and the Proximal Policy Optimization (PPO). According to the findings, the PPO outperformed both its benchmark index and the MVO. This paper demonstrates how dynamic asset allocation can benefit from the development of a reinforcement learning algorithm.

시뮬레이티드 어닐링을 이용한(m, n)중 연속(r,s) : F 시스템의 정비모형 (A Maintenance Design of Connected-(r, s)-out-of-(m, n) F System Using Simulated Annealing)

  • 이상헌;강영태;신동열
    • 대한산업공학회지
    • /
    • 제34권1호
    • /
    • pp.98-107
    • /
    • 2008
  • The purpose of this paper is to present an optimization scheme that aims at minimizing the expected cost per unittime. This study considers a linear connected-(r, s)-ouI-of-(m, n):f lattice system whose components are orderedlike the elements of a linear (m, n)-matrix. We assume that all components are in the state 1 (operating) or 0(failed) and identical and s-independent. The system fails whenever at least one connected (r, s)-submatrix offailed components occurs. To find the optimal threshold of maintenance intervention, we use a simulatedannealing(SA) algorithm for the cost optimization procedure. The expected cost per unit time is obtained byMonte Carlo simulation. We also has made sensitivity analysis to the different cost parameters. In this study,utility maintenance model is constructed so that minimize the expense under full equipment policy throughcomparison for the full equipment policy and preventive maintenance policy. The full equipment cycle and unitcost rate are acquired by simulated annealing algorithm. The SA algorithm is appeared to converge fast inmulti-component system that is suitable to optimization decision problem.

경영 시뮬레이션 게임에서 PPO 알고리즘을 적용한 강화학습의 유용성에 관한 연구 (A Study about the Usefulness of Reinforcement Learning in Business Simulation Games using PPO Algorithm)

  • 양의홍;강신진;조성현
    • 한국게임학회 논문지
    • /
    • 제19권6호
    • /
    • pp.61-70
    • /
    • 2019
  • 본 논문에서는 경영 시뮬레이션 게임 분야에서 강화학습을 적용하여 게임 에이전트들이 자율적으로 주어진 목표를 달성하는지를 확인하고자 한다. 본 시스템에서는 Unity Machine Learning (ML) Agent 환경에서 PPO (Proximal Policy Optimization) 알고리즘을 적용하여 게임 에이전트가 목표를 달성하기 위해 자동으로 플레이 방법을 찾도록 설계하였다. 그 유용성을 확인하기 위하여 5가지의 게임 시나리오 시뮬레이션 실험을 수행하였다. 그 결과 게임 에이전트가 다양한 게임 내 환경 변수의 변화에도 학습을 통하여 목표를 달성한다는 것을 확인하였다.

수요가 재생 도착과정을 따르는 (s, S) 재고 시스템에서 시뮬레이션 민감도 분석을 이용한 최적 전략 (Optimal Policy for (s, S) Inventory System Characterized by Renewal Arrival Process of Demand through Simulation Sensitivity Analysis)

  • 권치명
    • 한국시뮬레이션학회논문지
    • /
    • 제12권3호
    • /
    • pp.31-40
    • /
    • 2003
  • This paper studies an optimal policy for a certain class of (s, S) inventory control systems, where the demands are characterized by the renewal arrival process. To minimize the average cost over a simulation period, we apply a stochastic optimization algorithm which uses the gradients of parameters, s and S. We obtain the gradients of objective function with respect to ordering amount S and reorder point s via a combined perturbation method. This method uses the infinitesimal perturbation analysis and the smoothed perturbation analysis alternatively according to occurrences of ordering event changes. The optimal estimates of s and S from our simulation results are quite accurate. We consider that this may be due to the estimated gradients of little noise from the regenerative system simulation, and their effect on search procedure when we apply the stochastic optimization algorithm. The directions for future study stemming from this research pertain to extension to the more general inventory system with regard to demand distribution, backlogging policy, lead time, and inter-arrival times of demands. Another direction involves the efficiency of stochastic optimization algorithm related to searching procedure for an improving point of (s, S).

  • PDF

Policy implication of nuclear energy's potential for energy optimization and CO2 mitigation: A case study of Fujian, China

  • Peng, Lihong;Zhang, Yi;Li, Feng;Wang, Qian;Chen, Xiaochou;Yu, Ang
    • Nuclear Engineering and Technology
    • /
    • 제51권4호
    • /
    • pp.1154-1162
    • /
    • 2019
  • China is undertaking an energy reform from fossil fuels to clean energy to accomplish $CO_2$ intensity (CI) reduction commitments. After hydropower, nuclear energy is potential based on breadthwise comparison with the world and analysis of government energy consumption (EC) plan. This paper establishes a CI energy policy response forecasting model based on national and provincial EC plans. This model is then applied in Fujian Province to predict its CI from 2016 to 2020. The result shows that CI declines at a range of 43%-53% compared to that in 2005 considering five conditions of economic growth in 2020. Furthermore, Fujian will achieve the national goals in advance because EC is controlled and nuclear energy ratio increased to 16.4% (the proportion of non-fossil in primary energy is 26.7%). Finally, the development of nuclear energy in China and the world are analyzed, and several policies for energy optimization and CI reduction are proposed.

성과기반 군수지원체계의 정비정책 최적화를 위한 PIDO 기법 적용에 관한 연구 (A Study on the Application of PIDO Technique for the Maintenance Policy Optimization Considering the Performance-Based Logistics Support System)

  • 주현준;이재천
    • 한국산학기술학회논문지
    • /
    • 제15권2호
    • /
    • pp.632-637
    • /
    • 2014
  • 무기체계에 대한 군수지원 방법으로 성과중심 군수지원체계가 최근에 많은 관심을 끌고 있다. 기본개념은 운용단계에서의 민수계약으로 군수지원을 제공하게 되는데, 체계개발단계부터 군수지원요소가 결정되는 것이 필요하다. 또한 기존의 단일 성과지표로부터 확장하여 복수의 성과지표를 고려할 필요가 있다. 시스템 구조가 복잡해짐에 따라 기존 최적화기법의 적용에 제약이 존재하므로 유전자 알고리즘의 적용 가능성 판단이 요구된다. 본 연구에서는 운용단계 이전 체계개발단계에서부터의 성과기반군수지원 개념을 고려한 수리수준분석을 위한 요구사항을 식별한다. 또한, 운용단계 이전에 사용자의 요구사항에 따른 정비정책 대안 결정을 위하여 성과지표 설정 및 제약조건 변경이 용이한 PIDO 기반의 최적화 기법 적용 방안을 제시한다. PIDO 개념을 적용하고 있는 PIAnO와 ModelCenter 도구의 유전자 알고리즘이 정비정책 최적화 문제에 적용 가능함을 확인하였다.

저가 Redundant Manipulator의 최적 경로 생성을 위한 Deep Deterministic Policy Gradient(DDPG) 학습 (Learning Optimal Trajectory Generation for Low-Cost Redundant Manipulator using Deep Deterministic Policy Gradient(DDPG))

  • 이승현;진성호;황성현;이인호
    • 로봇학회논문지
    • /
    • 제17권1호
    • /
    • pp.58-67
    • /
    • 2022
  • In this paper, we propose an approach resolving inaccuracy of the low-cost redundant manipulator workspace with low encoder and low stiffness. When the manipulators are manufactured with low-cost encoders and low-cost links, the robots can run into workspace inaccuracy issues. Furthermore, trajectory generation based on conventional forward/inverse kinematics without taking into account inaccuracy issues will introduce the risk of end-effector fluctuations. Hence, we propose an optimization for the trajectory generation method based on the DDPG (Deep Deterministic Policy Gradient) algorithm for the low-cost redundant manipulators reaching the target position in Euclidean space. We designed the DDPG algorithm minimizing the distance along with the jacobian condition number. The training environment is selected with an error rate of randomly generated joint spaces in a simulator that implemented real-world physics, the test environment is a real robotic experiment and demonstrated our approach.