1 |
Sutton, R. S, 1988, Learning to predict by the method of temporal difference, Machine Learning, Vol. 9, pp.9-44
|
2 |
So, J. H., Cho, S. H., Song, M. H. and Park,M. S., 2001, Experimental study on control performance of reinforcement learning method,Proceedings of the SAREK, pp.697-701
|
3 |
Hang, C. C. and Astrom, K.J. and Ho, W.K., 1991, Refinements of the Ziegler-Nichols tuning formula, IEE Proceedings Part DControl Theory Applicat., Vol. 138, No.2, pp.111-118
|
4 |
Anderson, C. W., Hittle, D. C., Katz, A. D. and Kretchmar, R. M., 1997, Synthesis of reinforcement learning, neural networks, and PI control applied to a simulated heating coil.Artificial Intelligence in Engineering, Vol. 11,No.4, pp. 421-429
|
5 |
Virk, G. S. and Loveday, D. L., 1992, A comparison of predictive, PID, and on/off techniques for energy management and control, Proceedings of ASHRAE, pp. 3-10
|
6 |
Sutton, R. S. and Barto, A. G., 1998, Reinforcement Learning: an Introduction, Cambridge, MA: MIT Press, pp. 51-85
|
7 |
Anderson, C. W., 1993, Q-learning with hidden-unit restarting, Advances in Neural Information Processing Systems, Vol. 5, S.].Hanson, J. D. Cowan and C. L. Giles, eds.,Morgan Kaufmann Publishers, San Mateo,CA, pp. 81-88
|
8 |
Watkins, C. and Dayan, P., 1992, Technical note: Q-learning, Machine Learning, Vol. 8,pp. 279-292
|
9 |
Ministry of Commerce, Industry and Energy,2003, Total energy consumption report, pp.1-80
|
10 |
Barto, A. G., Bradtke, S.]. and Singh, S. P.,1995, Learning to act using real-time dynamic programming, Artificial Intelligence,Special Volume: Computational Research on Interaction and Agency, Vol. 72, No.1, pp.81-138
|