• 제목/요약/키워드: Policy Optimization

검색결과 307건 처리시간 0.03초

정책적 안전재고의 비용 최적화 : 제록스 소모품 유통공급망 사례연구 (Policy Safety Stock Cost Optimization : Xerox Consumable Supply Chain Case Study)

  • 서은석
    • 대한산업공학회지
    • /
    • 제41권5호
    • /
    • pp.511-520
    • /
    • 2015
  • Inventory, cost, and the level of service are three interrelated key metrics that most supply chain organizations are striving to optimize. One way to achieve this goal is to create a simulation model to conduct sensitivity analysis and optimization on several different supply chain policies that can be implemented in actual operation. In this paper, a case of Xerox global supply chain modeling and analysis to assess several "what if" scenarios for the consumable policy safety stock is presented. The simulation model, combined with analytical cost model and optimization module, is used to optimize the policy safety stock level to achieve the lowest total value chain cost. It was shown quantitatively that the policy safety stock can be reduced, but it is offset by the inbound premium transportation cost to expedite supplies in shortage, and the outbound premium transportation cost to send supplies to customers via express shipment, requiring fine balance.

휴먼형 로봇 손의 사물 조작 수행을 이용한 사람 데모 결합 강화학습 정책 성능 평가 (Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policies via Object Manipulation with an Anthropomorphic Robot Hand)

  • 박나현;오지헌;류가현;;;김태성
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권5호
    • /
    • pp.179-186
    • /
    • 2021
  • 로봇이 사람과 같이 다양하고 복잡한 사물 조작을 하기 위해서는 휴먼형 로봇 손의 사물 파지 작업이 필수적이다. 자유도 (Degree of Freedom, DoF)가 높은 휴먼형(anthropomorphic) 로봇 손을 학습시키기 위하여 사람 데모(human demonstration)가 결합한 강화학습 최적화 방법이 제안되었다. 본 연구에서는 강화학습 최적화 방법에 사람 데모가 결합한 Demonstration Augmented Natural Policy Gradient (DA-NPG)와 NPG의 성능 비교를 통하여 행동 복제의 효율성을 확인하고, DA-NPG, DA-Trust Region Policy Optimization (DA-TRPO), DA-Proximal Policy Optimization (DA-PPO)의 최적화 방법의 성능 평가를 위하여 6 종의 물체에 대한 휴먼형 로봇 손의 사물 조작 작업을 수행한다. 학습 후 DA-NPG와 NPG를 비교한 결과, NPG의 물체 파지 성공률은 평균 60%, DA-NPG는 평균 99.33%로, 휴먼형 로봇 손의 사물 조작 강화학습에 행동 복제가 효율적임을 증명하였다. 또한, DA-NPG는 DA-TRPO와 유사한 성능을 보이면서 모든 물체에 대한 사물 파지에 성공하였고 가장 안정적이었다. 반면, DA-TRPO와 DA-PPO는 사물 조작에 실패한 물체가 존재하여 불안정한 성능을 보였다. 본 연구에서 제안하는 방법은 향후 실제 휴먼형 로봇에 적용하여 휴먼형 로봇 손의 사물 조작 지능 개발에 유용할 것으로 전망된다.

의료서비스 최적화 : 현황 및 활성화 방안 (Healthcare Optimization : Current Status and Vitalization Suggestions)

  • 강성홍;김병인;전치혁;최병관;이신호
    • 대한산업공학회지
    • /
    • 제39권4호
    • /
    • pp.313-324
    • /
    • 2013
  • Healthcare optimization is mandatory to strengthen the competitiveness of domestic healthcare industry. Healthcare optimization aims to increase service quality, patient safety, and system efficiency. This paper reviews various healthcare optimization cases of developed countries, synopsizes the current status of domestic healthcare industry, points out several reasons why healthcare optimization is not active in Korea, and suggests some vitalization ways.

Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구 (A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization)

  • 박정민;김혜영;조성현
    • 한국게임학회 논문지
    • /
    • 제19권3호
    • /
    • pp.5-14
    • /
    • 2019
  • 게임 서버는 분산 서버를 기본으로 하고 있다. 분산 게임서버는 서버의 작업 부하를 분산하기 위한 일련의 알고리즘에 의해 각 게임 서버의 부하를 일정하게 나누어서 클라이언트들의 요청에 대한 서버의 응답시간 및 서버의 가용성을 효율적으로 관리한다. 본 논문에서는 시뮬레이션 환경에서 기존 연구 방식인 Greedy 알고리즘과, Reinforcement Learning의 한 줄기인 Policy Gradient 중 PPO(Proximal Policy Optimazation)을 이용한 부하 분산 Agent를 제안하고, 시뮬레이션 한 후 기존 연구들과의 비교 분석을 통해 성능을 평가하였다.

교착 회피를 고려한 Job-Shop 일정의 최적화 (Optimization of Job-Shop Schedule Considering Deadlock Avoidance)

  • 정동준;이두용;임성진
    • 대한기계학회논문집A
    • /
    • 제24권8호
    • /
    • pp.2131-2142
    • /
    • 2000
  • As recent production facilities are usually operated with unmanned material-handling system, the development of an efficient schedule with deadlock avoidance becomes a critical problem. Related researches on deadlock avoidance usually focus on real-time control of manufacturing system using deadlock avoidance policy. But little off-line optimization of deadlock-free schedule has been reported. This paper presents an optimization method for deadlock-free scheduling for Job-Shop system with no buffer. The deadlock-free schedule is acquired by the procedure that generates candidate lists of waiting operations, and applies a deadlock avoidance policy. To verify the proposed approach, simulation resultsare presented for minimizing makespan in three problem types. According to the simulation results the effect of each deadlock avoidance policy is dependent on the type of problem. When the proposed LOEM (Last Operation Exclusion Method) is employed, computing time for optimization as well as makespan is reduced.

An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm

  • Kim, Hye-Young
    • Journal of Information Processing Systems
    • /
    • 제17권2호
    • /
    • pp.297-305
    • /
    • 2021
  • Large amount of data is being generated in gaming servers due to the increase in the number of users and the variety of game services being provided. In particular, load balancing schemes for gaming servers are crucial consideration. The existing literature proposes algorithms that distribute loads in servers by mostly concentrating on load balancing and cooperative offloading. However, many proposed schemes impose heavy restrictions and assumptions, and such a limited service classification method is not enough to satisfy the wide range of service requirements. We propose a load balancing agent that combines the dynamic allocation programming method, a type of greedy algorithm, and proximal policy optimization, a reinforcement learning. Also, we compare performances of our proposed scheme and those of a scheme from previous literature, ProGreGA, by running a simulation.

Restructuring Primary Health Care Network to Maximize Utilization and Reduce Patient Out-of-pocket Expenses

  • Bardhan, Amit Kumar;Kumar, Kaushal
    • Asian Journal of Innovation and Policy
    • /
    • 제8권1호
    • /
    • pp.122-140
    • /
    • 2019
  • Providing free primary care to everyone is an important goal pursued by many countries under universal health care programs. Countries like India need to efficiently utilize their limited capacities towards this purpose. Unfortunately, due to a variety of reasons, patients incur substantial travel and out-of-pocket expenses for getting primary care from publicly-funded facilities. We propose a set-covering optimization model to assist health policy-makers in managing existing capacity in a better way. Decision-making should consider upgrading centers with better potential to reduce patient expenses and reallocating capacities from less preferred facilities. A multinomial logit choice model is used to predict the preferences. In this article, a brief background and literature survey along with the mixed integer linear programming (MILP) optimization model are presented. The working of the model is illustrated with the help of numerical experiments.

(m, n)중 연속(r, s) : F 시스템의 정비모형에 대한 개미군집 최적화 해법 (Ant Colony Optimization Approach to the Utility Maintenance Model for Connected-(r, s)-out of-(m, n) : F System)

  • 이상헌;신동열
    • 산업공학
    • /
    • 제21권3호
    • /
    • pp.254-261
    • /
    • 2008
  • Connected-(r,s)-out of-(m,n) : F system is an important topic in redundancy design of the complex system reliability and it's maintenance policy. Previous studies applied Monte Carlo simulation and genetic, simulated annealing algorithms to tackle the difficulty of maintenance policy problem. These algorithms suggested most suitable maintenance cycle to optimize maintenance pattern of connected-(r,s)-out of-(m,n) : F system. However, genetic algorithm is required long execution time relatively and simulated annealing has improved computational time but rather poor solutions. In this paper, we propose the ant colony optimization approach for connected-(r,s)-out of-(m,n) : F system that determines maintenance cycle and minimum unit cost. Computational results prove that ant colony optimization algorithm is superior to genetic algorithm, simulated annealing and tabu search in both execution time and quality of solution.

Cloud Task Scheduling Based on Proximal Policy Optimization Algorithm for Lowering Energy Consumption of Data Center

  • Yang, Yongquan;He, Cuihua;Yin, Bo;Wei, Zhiqiang;Hong, Bowei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권6호
    • /
    • pp.1877-1891
    • /
    • 2022
  • As a part of cloud computing technology, algorithms for cloud task scheduling place an important influence on the area of cloud computing in data centers. In our earlier work, we proposed DeepEnergyJS, which was designed based on the original version of the policy gradient and reinforcement learning algorithm. We verified its effectiveness through simulation experiments. In this study, we used the Proximal Policy Optimization (PPO) algorithm to update DeepEnergyJS to DeepEnergyJSV2.0. First, we verify the convergence of the PPO algorithm on the dataset of Alibaba Cluster Data V2018. Then we contrast it with reinforcement learning algorithm in terms of convergence rate, converged value, and stability. The results indicate that PPO performed better in training and test data sets compared with reinforcement learning algorithm, as well as other general heuristic algorithms, such as First Fit, Random, and Tetris. DeepEnergyJSV2.0 achieves better energy efficiency than DeepEnergyJS by about 7.814%.

(s, S) 재고관리 시스템에 대한 확률최적화 기법의 응용 (Application of Stochastic Optimization Method to (s, S) Inventory System)

  • Chimyung Kwon
    • 한국시뮬레이션학회논문지
    • /
    • 제12권2호
    • /
    • pp.1-11
    • /
    • 2003
  • In this paper, we focus an optimal policy focus optimal class of (s, S) inventory control systems. To this end, we use the perturbation analysis and apply a stochastic optimization algorithm to minimize the average cost over a period. We obtain the gradients of objective function with respect to ordering amount S and reorder point s via a combined perturbation method. This method uses the infinitesimal perturbation analysis and the smoothed perturbation analysis alternatively according to occurrences of ordering event changes. Our simulation results indicate that the optimal estimates of s and S obtained from a stochastic optimization algorithm are quite accurate. We consider that this may be due to the estimated gradients of little noise from the regenerative system simulation, and their effect on search procedure when we apply the stochastic optimization algorithm. The directions for future study stemming from this research pertain to extension to the more general inventory system with regard to demand distribution, backlogging policy, lead time, and review period. Another directions involves the efficiency of stochastic optimization algorithm related to searching procedure for an improving point of (s, S).

  • PDF