In order to minimize the picking time when the products are released from the warehouse, they should be located close to the exit when the products are released. Currently, the warehouse determines the loading location based on the order of the requirement of products, that is, the frequency of arrival and departure. Items with lower requirement ranks are loaded away from the exit, and items with higher requirement ranks are loaded closer from the exit. This is a case in which the delivery time is faster than the products located near the exit, even if the products are loaded far from the exit due to the low requirement ranking. In this case, there is a problem in that the transit time increases when the product is released. In order to solve the problem, we use the idle time of the stocker in the warehouse to rearrange the products according to the order of delivery time. Temporal difference learning method using Q_learning control, which is one of reinforcement learning types, was used when relocating items. The results of rearranging the products using the reinforcement learning method were compared and analyzed with the results of the existing method.
Howard, Ronald, "Dynamic programming and Markov processes", Massachusetts Institute of Technology Press, 1960.
Richard S. Sutton, Andrew G. Barto, "Reinforcement Learning: An Introduction Second edition, in progress", Massachusetts Institute of Technology Press, 2015.
Lee W.W., Yang H.R., Kim G.W., Lee Y.M., Lee U.R., "Reinforcement Learning with Python and Keras", wikibooks, 2020.
Moon S.U., Jung D.E., Kim J.H., Cho Y.W., "Comparison of Sliding puzzle agent learning performance through Monte Carlo method and Temporal difference learning (SARSA control, Q-learning control) method", Journal of the institute of Electronics and Information Engineers, pp. 709-712, 2021.
Kim S.W., Chung K.S., "The Robust Estimation with Spatial Economics Models using 3-Dimension Weight Matrix considering the Height of the House", Housing Studies Review, vol. 18, pp. 73~92, 2010.