Capacitated Fab Scheduling Approximation using Average Reward TD(${\lambda}$) Learning based on System Feature Functions

시스템 특성함수 기반 평균보상 TD(${\lambda}$) 학습을 통한 유한용량 Fab 스케줄링 근사화

  • Choi, Jin-Young (Division of Industrial and Information Systems Engineering, Ajou University)
  • 최진영 (아주대학교 산업정보시스템공학부)
  • Received : 2011.10.06
  • Accepted : 2011.12.15
  • Published : 2011.12.31

Abstract

In this paper, we propose a logical control-based actor-critic algorithm as an efficient approach for the approximation of the capacitated fab scheduling problem. We apply the average reward temporal-difference learning method for estimating the relative value functions of system states, while avoiding deadlock situation by Banker's algorithm. We consider the Intel mini-fab re-entrant line for the evaluation of the suggested algorithm and perform a numerical experiment by generating some sample system configurations randomly. We show that the suggested method has a prominent performance compared to other well-known heuristics.

Keywords

References

  1. Bertsekas, D. P. and Tsitsiklis, J. N.; Neuro-Dynamic Programming, Athena Scientific, 1996.
  2. Choi, J. Y. and Reveliotis, S. A.; "Relative value function approximation for the capacitated re-entrant line scheduling problem," IEEE Trans. Autom. Science and Eng, 2(3) : 285-299, 2005. https://doi.org/10.1109/TASE.2005.849085
  3. Choi, J. Y. and Kim, S. B.; "Computationally efficient neuro-dynamic programming approximation method for the capacitated re-entrant line scheduling problem," International Journal of Production Research (accepted).
  4. Jolliffe, I. T.; Principal Component Analysis, Springer-Verlag, 2002.
  5. Kumar, P. R.; "Scheduling manufacturing systems of re-entrant lines," in Stochastic Modeling and Analysis of Manufacturing Systems, D. D. Yao, Ed. Berlin, Germanyy : Springer-Verlag, 325-360, 1994.
  6. Kumar, P. R.; "Scheduling semiconductor manufacturing plants," IEEE Control Syst. Mag, 14(6) : 33-40, 1994.
  7. Kumar, S. and Kumar, P. R.; "Fluctuation smoothing policies are stable for stochastic re-entrant lines," Discrete-Event Dynam. Syst., : Theory and Applicat., 6 : 361-370, 1996. https://doi.org/10.1007/BF01797136
  8. Kumar, S. and Kumar, P. R.; "Queueing network models in the design and analysis of semiconductor wafer fabs," IEEE Trans. Robot. Automat., 17(5) : 548-561, 2001. https://doi.org/10.1109/70.964657
  9. Lu, S. H. and Kumar, P. R.; "Distributed scheduling based on due dates and buffer priorities," IEEE Trans. Autom. Control, 36(12) : 1406-1416, 1991. https://doi.org/10.1109/9.106156
  10. Lu, S. H., Ramaswamy, D., and Kumar, P. R.; "Efficient scheduling policies to reduce mean and variance of cycle-time in semiconductor manufacturing plants," IEEE Trans. Semicond. Manuf, 7(3) : 374-385, 1994. https://doi.org/10.1109/66.311341
  11. Puterman, M. L.; Markov Decision Processes : Discrete Stochastic Dynamic Programming, New York : Wiley, 1994.
  12. Reveliotis, S. A.; "The destabilizing effect of blocking due to finite buffering capacity in multi-class queueing networks," IEEE Trans. Autom. Control, 45(3) : 585-588, 2000. https://doi.org/10.1109/9.847750
  13. Reveliotis, S. A.; Real-time management of resource allocation systems, Springer, 2005.
  14. Rossetti, M. D., Hill, R. R., Johansson, B., Dunkin, A., and Ingalls, R. G.; "A simulation-based approximate dynamic programming approach for the control of the intel mini-fab benchmark model," In the proc. of the 2009 winter sim. conf., 2009.
  15. Sutton, R. S. and Barto, A. G.; Reinforcement Learning (An Introduction), MIT Press, 1999.
  16. Tsitsiklis, J. N. and Roy, B. V.; "Feature-based methods for large scale dynamic programming," Machine Learning, 22 : 59-94, 1996.
  17. Tsitsiklis, J. N. and Roy, B. V.; "Average cost temporal-difference learning," Automatica, 35 : 1799-1808, 1999. https://doi.org/10.1016/S0005-1098(99)00099-0
  18. Wein, L. M.; "Scheduling semiconductor wafer fabrication," IEEE Trans. Semicond. Manufact., 1(3) : 115-130, 1988. https://doi.org/10.1109/66.4384