• Title/Summary/Keyword: Exploration Bonus

Search Result 3, Processing Time 0.019 seconds

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

Hierachical Reinforcement Learning with Exploration Bonus (탐색 강화 계층적 강화 학습)

  • 이승준;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.151-153
    • /
    • 2001
  • Q-Learning과 같은 기본적인 강화 학습 알고리즘은 문제의 사이즈가 커짐에 따라 성능이 크게 떨어지게 된다. 그 이유들로는 목표와의 거리가 멀어지게 되어 학습이 어려워지는 문제와 비 지향적 탐색을 사용함으로써 효율적인 탐색이 어려운 문제를 들 수 있다. 이들을 해결하기 위해 목표와의 거리를 줄일 수 있는 계층적 강화 학습 모델과 여러 가지 지향적 탐색 모델이 있어 왔다. 본 논문에서는 이들을 결합하여 계층적 강화 학습 모델에 지향적 탐색을 가능하게 하는 탐색 보너스를 도입한 강화 학습 모델을 제시한다.

  • PDF

A Localized Adaptive QoS Routing using TD(${\lambda}$) method (TD(${\lambda}$) 기법을 사용한 지역적이며 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.5B
    • /
    • pp.304-309
    • /
    • 2005
  • In this paper, we propose a localized Adaptive QoS Routing using TD method and evaluate performance of various exploration methods when path is selected. Expecially, through extensive simulation, the proposed routing algorithm and exploration method using Exploration Bonus are shown to be effective in significantly reducing the overall blocking probability, when compared to the other path selection method(exploration method), because the proposed exploration method is more adaptive to network environments than others when path is selected.