1 |
Singh, S., Jaakkola, T., Littman, M. L., Szepesv'ari, C, "Convergence results for single-step on-policy reinforcement-learning algorithms," Journal of Machine Learning, Vol.38, No.3, pp. 287-308, 2000
DOI
|
2 |
R. Sutton and A. Barto, Reinforcement Learning. MIT Press, 2000
|
3 |
M. N. ahmadabadi and M. Asadpour, "Expertness based cooperative Q-learning," IEEE Trans. on Systems, Man, and Cybernetics, part B, Vol.32, No.1, pp. 66-76, 2002
DOI
ScienceOn
|
4 |
Junhong Nie; Haykin, S., "A dynamic channel assignment policy through Q-learning," IEEE Trans. on Neural Networks, Vol.10, No.6, pp. 1443-1455, 1999
DOI
ScienceOn
|
5 |
T. Mitchell, Machine Learning, McGraw Hill, 1989
|
6 |
Tekinay, S.; Jabbari, B., "Handover and channel assignment in mobile cellular networks," Communications Magazine, IEEE, Vol.29, No.11, pp. 42-46, 1991
|
7 |
A. Y. Ng, D. Harada, and S. Russel. "Policy invariance under reward transformations: theory and application to reward shaping," in Proc. of the 16th Int. Conf. on Machine Learning, pp. 278-287, 1999
|
8 |
M. L. Littman. Algorithms for sequential decision making. Unpublished Ph.D. Thesis, Brown University, Providence, R.I. 1996
|
9 |
H. S. Chang, "Reinforcement Learning with Supervision by Combining Multiple Learnings and Expert Advices," in Proc. of the 2006 American Control Conference, pp. 4159-4164, 2006
|