[KSCI] Korea Science Citation Index Service

Opportunistic Spectrum Access Based on a Constrained Multi-Armed Bandit Formulation

Ai, Jing (Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute)
Abouzeid, Alhussein A. (Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute)

Publication Information

Journal of Communications and Networks / v.11, no.2, 2009 , pp. 134-147 More about this Journal

Abstract

Tracking and exploiting instantaneous spectrum opportunities are fundamental challenges in opportunistic spectrum access (OSA) in presence of the bursty traffic of primary users and the limited spectrum sensing capability of secondary users. In order to take advantage of the history of spectrum sensing and access decisions, a sequential decision framework is widely used to design optimal policies. However, many existing schemes, based on a partially observed Markov decision process (POMDP) framework, reveal that optimal policies are non-stationary in nature which renders them difficult to calculate and implement. Therefore, this work pursues stationary OSA policies, which are thereby efficient yet low-complexity, while still incorporating many practical factors, such as spectrum sensing errors and a priori unknown statistical spectrum knowledge. First, with an approximation on channel evolution, OSA is formulated in a multi-armed bandit (MAB) framework. As a result, the optimal policy is specified by the wellknown Gittins index rule, where the channel with the largest Gittins index is always selected. Then, closed-form formulas are derived for the Gittins indices with tunable approximation, and the design of a reinforcement learning algorithm is presented for calculating the Gittins indices, depending on whether the Markovian channel parameters are available a priori or not. Finally, the superiority of the scheme is presented via extensive experiments compared to other existing schemes in terms of the quality of policies and optimality.

Keywords

Multi-armed bandit (MAB) problem; opportunistic spectrum access (OSA); partially observed Markov decision process (POMDP); reinforcement learning (RL);

Citations & Related Records

Times Cited By Web Of Science : 1 (Related Records In Web of Science)
Times Cited By SCOPUS : 2

Reference
Cited By SCOPUS

1	P. Whittle, "Restless bandits: Activity allocation in a changing world," J. Appl. Prob., vol. 25, pp. 287–298, 1988
2	Q. Zhao, B. Krishnamachari, and K. Liu, "Low-complexity approaches to spectrum opportunity tracking," in Proc. CrownCom, Orlando, FL., Aug. 2007 DOI
3	G. Koole, Z. Liu, and R. Righter, "Optimal transmission policies for noisy channels," Operations Research, vol. 49, no. 6, pp. 892–899, Nov. 2001 DOI ScienceOn
4	P. Whittle, "Multi-armed bandit and the gittins index," J. Royal Statistical Society. Series B (Methodology), vol. 24, no. 2, pp. 143–149, 1980
5	P. P. Varaiya, J. C. Walrand, and C. Buyukkoc, 'Extensions of the multiarmed bandit problem: The discounted case,' IEEE Trans. Autom. Control, vol. 30, no. 5, pp. 426–439, May 1985
6	K. Liu and Q. Zhao, "A restless bandit formulation of opportunistic access: Indexablity and index policy," in Proc. IEEE Workshop on Netw. Technol. for Software Defined Radio (SDR) Networks, June 2008 DOI
7	Y. Chen, Q. Zhao, and A. Swami, "Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors," IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2053–2071, May 2008 DOI ScienceOn
8	L. Lai et al., "Cognitive medium acess: Exploration, exploitation and competition," submitted to IEEE/ACM Trans. Netw., Oct. 2007 DOI ScienceOn
9	S. Geirhofer, L. Tong, and B. M. Sadler, "Dynamic spectrum access in the time domain: Modeling and exploiting white space," IEEE Commun. Mag., vol. 45, no. 5, pp. 66–72, May 2007 DOI ScienceOn
10	M. O. Duff, "Q-learning for bandit problems," in Proc. IEEE ICML, July 1995, pp. 209–217
11	Q. Zhao, L. Tong, A. Swami, and Y. Chen, 'Decentralized cognitive mac for opportunistic spectrum access in ad hoc networks: A pomdp framework,' IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589–600, Apr. 2007 DOI ScienceOn
12	The MathWorks Inc., "Matlab.," [Online]. Available: http://www.mathworks.com/
13	D. V. Djonin, Q. Zhao, and V. Krishnamurthy, "Optimality and complexity of opportunistic spectrum access: A truncated markov decision process formulation," in Proc. IEEE ICC, June 2007, pp. 5787–5792 DOI
14	M. N. Katehakis and J. Arthur F. Veinott, "The multi-armed bandits problem: Decomposition and computation," Mathematics of Operations Research, vol. 12, no. 2, pp. 262–268, May 1987 DOI ScienceOn
15	Q. Zhao and B. M. Sadler, "A survey of dynamic spectrum access," IEEE Signal Process. Mag., vol. 24, no. 3, pp. 79–89, May 2007 DOI ScienceOn
16	R. D. Smallwood and E. J. Sondik, 'The optimal control of partially observable markov processes over a finite horizon,' Operations Research, vol. 21, no. 5, pp. 1071–1088, 1973
17	J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Mach. Learning, vol. 16, no. 3, pp. 185–202, Sept. 1994 DOI
18	J. C. Gittins and D. M. Jones, "A dynamic allocation index for the sequential design of experiments," in Progress in Statistics (European Meeting of Statisticians, Budapest, 1972), 1974, pp. 241–266
19	R. Engelman et al. (2002, Nov.). Spectrum policy task force. Federal Commun. Commission. [Online]. Available: http://www.fcc.gov/sptf/files /SEWGFinalReport_1.pdf
20	J. Mitola and G. Q. Maguire, "Cognitive radio: Making software radios more personal," IEEE Pers. Commun., vol. 6, no. 4, pp. 13–18, Aug. 1999 DOI ScienceOn
21	S. Geirhofer, L. Tong, and B. M. Sadler, "Dynamic spectrum access in wlan channels: Empirical model and its stochastic analysis," in Proc. ACM TAPAS, Boston, MA, USA, Aug. 2006 DOI
22	D. P. Bertsekas, Dynamic programming and optimal control. 2nd ed., vol. 2, Athena Scientific, 2001
23	V. Krishnamurthy and R. J. Evans, "Hidden markov model multiarm bandits: A methodology for beam scheduling in multitarget tracking," IEEE Trans. Signal Process., vol. 49, no. 12, pp. 2893–2908, Dec. 2001 DOI ScienceOn