DOI QR코드

DOI QR Code

Cooperative Multi-Agent Reinforcement Learning-Based Behavior Control of Grid Sortation Systems in Smart Factory

스마트 팩토리에서 그리드 분류 시스템의 협력적 다중 에이전트 강화 학습 기반 행동 제어

  • Received : 2020.03.13
  • Accepted : 2020.04.24
  • Published : 2020.08.31

Abstract

Smart Factory consists of digital automation solutions throughout the production process, including design, development, manufacturing and distribution, and it is an intelligent factory that installs IoT in its internal facilities and machines to collect process data in real time and analyze them so that it can control itself. The smart factory's equipment works in a physical combination of numerous hardware, rather than a virtual character being driven by a single object, such as a game. In other words, for a specific common goal, multiple devices must perform individual actions simultaneously. By taking advantage of the smart factory, which can collect process data in real time, if reinforcement learning is used instead of general machine learning, behavior control can be performed without the required training data. However, in the real world, it is impossible to learn more than tens of millions of iterations due to physical wear and time. Thus, this paper uses simulators to develop grid sortation systems focusing on transport facilities, one of the complex environments in smart factory field, and design cooperative multi-agent-based reinforcement learning to demonstrate efficient behavior control.

스마트 팩토리는 설계, 개발, 제조 및 유통 등 생산과정 전반이 디지털 자동화 솔루션으로 이루어져 있으며, 내부 설비와 기계에 사물인터넷(IoT)을 설치해 공정 데이터를 실시간으로 수집하고 이를 분석해 스스로 제어할 수 있게 하는 지능형 공장이다. 스마트 팩토리의 장비들은 게임과 같이 가상의 캐릭터가 하나의 객체 단위로 구동되는 것이 아니라 수많은 하드웨어가 물리적으로 조합되어 연동한다. 즉, 특정한 공동의 목표를 위해 다수의 장치가 개별적인 행동을 동시다발적으로 수행해야 한다. 공정 데이터를 실시간으로 수집할 수 있는 스마트 팩토리의 장점을 활용하여, 일반적인 기계 학습이 아닌 강화 학습을 사용하면 미리 요구되는 훈련 데이터 없이 행동 제어를 할 수 있다. 하지만, 현실 세계에서는 물리적 마모, 시간적 문제 등으로 인해 수천만 번 이상의 반복 학습이 불가능하다. 따라서, 본 논문에서는 시뮬레이터를 활용해 스마트 팩토리 분야에서 복잡한 환경 중 하나인 이송 설비에 초점을 둔 그리드 분류 시스템을 개발하고 협력적 다중 에이전트 기반의 강화 학습을 설계하여 효율적인 행동 제어가 가능함을 입증한다.

Keywords

References

  1. Real Games, Factory I/O [Internet], https://factoryio.com.
  2. Arun Jayaraman, Ramu Narayanaswamy and Ali K. Gunal, "A sortation system model," Proceedings of the 1997 Winter Simulation Conference, Atlanta, GA, USA, pp. 866-871, Dec. 1997.
  3. Patrick M McGuire, Conveyors: Application, Selection, and Integration, 1st Edition, CRC Press, 2009.
  4. M. Eric Johnson, “The impact of sorting strategies on automated sortation system performance,” IIE Transactions, Vol. 30, No. 1, pp. 67-77, Jan. 1997. https://doi.org/10.1023/A:1007445629340
  5. James C. Chen, Chien-Fu Huang, Tzu-Li Chen, and Yu-Hsin Lee, "Solving a Sortation Conveyor Layout Design Problem with Simulation-optimization Approach," 2019 IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA), Tokyo, Japan, pp. 551-555, Apr. 2019.
  6. Russell D. Meller, “Optimal order-to-lane assignments in an order accumulation/sortation system,” IIE Transactions, Vol. 29, No. 4, pp. 293-301, Apr. 1997. https://doi.org/10.1080/07408179708966335
  7. Shiwang Hou, “Distribution Center Logistics Optimization Based on Simulation,” Research Journal of Applied Sciences, Engineering and Technology, Vol. 5, No. 21, pp. 5107-5111, 2013. https://doi.org/10.19026/rjaset.5.4405
  8. Stefan Fedtke and Nils Boysen, “Layout Planning of Sortation Conveyors in Parcel Distribution Centers,” Transportation Science, Vol. 51, No. 1, pp. 3-18, Feb. 2017. https://doi.org/10.1287/trsc.2014.0540
  9. Zazilia Seibold, Thomas Stoll, and Kai Furmans, "Layoutoptimized sorting of goods with decentralized controlled conveying modules," 2013 IEEE International Systems Conference (SysCon), Apr. 2013.
  10. Mir Alireza Athari, Farzad Ahmadinejad, and Mehran Ahmadi, "Design and Implementation of a Parcel Sorter Using Deep Learning," 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Dec. 2018.
  11. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, "Human-level control through deep reinforcement learning," Nature, Vol. 518, pp. 529-533, Feb. 2015. https://doi.org/10.1038/nature14236
  12. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347, Jul. 2017.
  13. John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel, "Trust Region Policy Optimization," arXiv:1502.05477, Feb. 2015.
  14. Fu-bin Pan, "Simulation Design of Express Sorting System-Example of SF's Sorting Center," The Open Cybernetics & Systemics Journal, Vol. 8, pp. 1116-1122, 2014. https://doi.org/10.2174/1874110X01408011116
  15. Real Games, Factory I/O SDK [Internet], https://github.com/realgamessoftware/factoryio-sdk.