• 제목/요약/키워드: Markov game

검색결과 33건 처리시간 0.024초

Non-Cooperative Game Joint Hidden Markov Model for Spectrum Allocation in Cognitive Radio Networks

  • Jiao, Yan
    • International journal of advanced smart convergence
    • /
    • 제7권1호
    • /
    • pp.15-23
    • /
    • 2018
  • Spectrum allocation is a key operation in cognitive radio networks (CRNs), where secondary users (SUs) are usually selfish - to achieve itself utility maximization. In view of this context, much prior lit literature proposed spectrum allocation base on non-cooperative game models. However, the most of them proposed non-cooperative game models based on complete information of CRNs. In practical, primary users (PUs) in a dynamic wireless environment with noise uncertainty, shadowing, and fading is difficult to attain a complete information about them. In this paper, we propose a non-cooperative game joint hidden markov model scheme for spectrum allocation in CRNs. Firstly, we propose a new hidden markov model for SUs to predict the sensing results of competitors. Then, we introduce the proposed hidden markov model into the non-cooperative game. That is, it predicts the sensing results of competitors before the non-cooperative game. The simulation results show that the proposed scheme improves the energy efficiency of networks and utilization of SUs.

Network Security Situation Assessment Method Based on Markov Game Model

  • Li, Xi;Lu, Yu;Liu, Sen;Nie, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권5호
    • /
    • pp.2414-2428
    • /
    • 2018
  • In order to solve the problem that the current network security situation assessment methods just focus on the attack behaviors, this paper proposes a kind of network security situation assessment method based on Markov Decision Process and Game theory. The method takes the Markov Game model as the core, and uses the 4 levels data fusion to realize the evaluation of the network security situation. In this process, the Nash equilibrium point of the game is used to determine the impact on the network security. Experiments show that the results of this method are basically consistent with the expert evaluation data. As the method takes full account of the interaction between the attackers and defenders, it is closer to reality, and can accurately assess network security situation.

The Ramp-Rate Constraint Effects on the Generators' Equilibrium Strategy in Electricity Markets

  • Joung, Man-Ho;Kim, Jin-Ho
    • Journal of Electrical Engineering and Technology
    • /
    • 제3권4호
    • /
    • pp.509-513
    • /
    • 2008
  • In this paper, we investigate how generators' ramp-rate constraints may influence their equilibrium strategy formulation. In the market model proposed in this study, the generators' ramp-rate constraints are explicitly represented. In order to fully characterize the inter-temporal nature of the ramp-rate constraints, a dynamic game model is presented. The subgame perfect Nash equilibrium is adopted as the solution of the game and the backward induction procedure for the solution of the game is designed in this paper. The inter-temporal nature of the ramp-rate constraints results in the Markov property of the game, and we have found that the Markov property of the game significantly simplifies the subgame perfect Nash equilibrium characterization. Finally, a simple electricity market numerical illustration is presented for the successful application of the approach proposed.

STOPPING TIMES IN THE GAME ROCK-PAPER-SCISSORS

  • Jeong, Kyeonghoon;Yoo, Hyun Jae
    • 대한수학회보
    • /
    • 제56권6호
    • /
    • pp.1497-1510
    • /
    • 2019
  • In this paper we compute the stopping times in the game Rock-Paper-Scissors. By exploiting the recurrence relation we compute the mean values of stopping times. On the other hand, by constructing a transition matrix for a Markov chain associated with the game, we get also the distribution of the stopping times and thereby we compute the mean stopping times again. Then we show that the mean stopping times increase exponentially fast as the number of the participants increases.

Optimal Network Defense Strategy Selection Based on Markov Bayesian Game

  • Wang, Zengguang;Lu, Yu;Li, Xi;Nie, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권11호
    • /
    • pp.5631-5652
    • /
    • 2019
  • The existing defense strategy selection methods based on game theory basically select the optimal defense strategy in the form of mixed strategy. However, it is hard for network managers to understand and implement the defense strategy in this way. To address this problem, we constructed the incomplete information stochastic game model for the dynamic analysis to predict multi-stage attack-defense process by combining Bayesian game theory and the Markov decision-making method. In addition, the payoffs are quantified from the impact value of attack-defense actions. Based on previous statements, we designed an optimal defense strategy selection method. The optimal defense strategy is selected, which regards defense effectiveness as the criterion. The proposed method is feasibly verified via a representative experiment. Compared to the classical strategy selection methods based on the game theory, the proposed method can select the optimal strategy of the multi-stage attack-defense process in the form of pure strategy, which has been proved more operable than the compared ones.

MDP에 의한 컬링 전략 선정 (Markov Decision Process for Curling Strategies)

  • 배기욱;박동현;김동현;신하용
    • 대한산업공학회지
    • /
    • 제42권1호
    • /
    • pp.65-72
    • /
    • 2016
  • Curling is compared to the Chess because of variety and importance of strategies. For winning the Curling game, selecting optimal strategies at decision making points are important. However, there is lack of research on optimal strategies for Curling. 'Aggressive' and 'Conservative' strategies are common strategies of Curling; nevertheless, even those two strategies have never been studied before. In this study, Markov Decision Process would be applied for Curling strategy analysis. Those two strategies are defined as actions of Markov Decision Process. By solving the model, the optimal strategy could be found at any in-game states.

미래 사물인터넷을 위한 마르코프 게임 기반의 QoS 제어 기법 (A Markov Game based QoS Control Scheme for the Next Generation Internet of Things)

  • 김승욱
    • 정보과학회 논문지
    • /
    • 제42권11호
    • /
    • pp.1423-1429
    • /
    • 2015
  • 최근, 인터넷이 확장됨에 따라 새로운 가치를 생산하는 활용성이 증가되고 있다. 사물인터넷(Internet of Things)은 미래인터넷의 새로운 개념으로, 네트워크 물리적 객체들의 상호연결을 강조하여 최근 크게 주목받고 있으나, 사물인터넷상에서 서로 다른 서비스품질 요구를 만족시키기란 상당히 어렵다. 본 논문에서는, 사물인터넷 시스템의 다양한 서비스품질요구를 만족시킬 수 있는 효율적인 자원할당 방법을 제시한다. 제안된 방법은 마르코프 게임 모델에 기초하여 시스템 성능을 최대화할 수 있도록 효율적으로 사물인터넷 자원들을 할당한다. 시뮬레이션 결과, 제안된 방법은 현재의 사물인터넷 상황에서 기존의 방식에 비해 뛰어난 성능을 보여준다.

Some Recent Results of Approximation Algorithms for Markov Games and their Applications

  • 장형수
    • 한국전산응용수학회:학술대회논문집
    • /
    • 한국전산응용수학회 2003년도 KSCAM 학술발표회 프로그램 및 초록집
    • /
    • pp.15-15
    • /
    • 2003
  • We provide some recent results of approximation algorithms for solving Markov Games and discuss their applications to problems that arise in Computer Science. We consider a receding horizon approach as an approximate solution to two-person zero-sum Markov games with an infinite horizon discounted cost criterion. We present error bounds from the optimal equilibrium value of the game when both players take “correlated” receding horizon policies that are based on exact or approximate solutions of receding finite horizon subgames. Motivated by the worst-case optimal control of queueing systems by Altman, we then analyze error bounds when the minimizer plays the (approximate) receding horizon control and the maximizer plays the worst case policy. We give two heuristic examples of the approximate receding horizon control. We extend “parallel rollout” and “hindsight optimization” into the Markov game setting within the framework of the approximate receding horizon approach and analyze their performances. From the parallel rollout approach, the minimizing player seeks to combine dynamically multiple heuristic policies in a set to improve the performances of all of the heuristic policies simultaneously under the guess that the maximizing player has chosen a fixed worst-case policy. Given $\varepsilon$>0, we give the value of the receding horizon which guarantees that the parallel rollout policy with the horizon played by the minimizer “dominates” any heuristic policy in the set by $\varepsilon$, From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon. We finally discuss practical implementations of the receding horizon approaches via simulation and applications.

  • PDF

공간의존 파론도 게임의 재분배 모형 (A redistribution model for spatially dependent Parrondo games)

  • 이지연
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.121-130
    • /
    • 2016
  • N명의 게임자들이 둥글게 둘러앉아 공간의존 파론도 게임 B를 실시한다. 게임 B는 여러 명의 게임자들 중에서 한 명을 임의로 선택하고, 선택된 게임자는 양 옆에 있는 두 명의 게임자들의 상태에 따라 앞면이 나올 확률이 달라지는 동전을 던져서 앞면이 나오면 1원을 얻고 뒷면이 나오면 1원을 잃는다. 게임 A'은 임의로 선택된 게임자가 나머지 N - 1명의 게임자들 중에서 한 명을 임의로 선택하여 본인의 상금 1원을 전달하는 게임으로 전체 게임자들의 총 상금에는 변함이 없으므로 전체 게임자들에게는 항상 공정한 게임이다. 만약 게임 B가 지는 게임인 반면에 두 게임 A'와 B를 결합한 혼합게임 C는 이기는 게임이 되면 파론도 효과가 존재하고, 게임 B가 이기는 게임이고 혼합게임 C는 지는 게임이면 역파론도 효과가 존재한다고 한다. 먼저 마코프 체인의 상태공간의 축소를 위한 lumpability 조건이 게임 A', B 그리고 혼합게임 C에 대해 만족함을 보이고, 축소된 상태공간에서 게임 B와 C의 기대상금을 계산한다. 이를 이용하여 파론도 효과와 역파론도 효과의 존재를 확인하고, 특별히 $3{\leq}N{\leq}6$의 경우에는 파론도 효과와 역파론도 효과가 존재하는 확률 모수의 영역을 도식화 한다.

심층 큐 신경망을 이용한 게임 에이전트 구현 (Deep Q-Network based Game Agents)

  • 한동기;김명섭;김재윤;김정수
    • 로봇학회논문지
    • /
    • 제14권3호
    • /
    • pp.157-162
    • /
    • 2019
  • The video game Tetris is one of most popular game and it is well known that its game rule can be modelled as MDP (Markov Decision Process). This paper presents a DQN (Deep Q-Network) based game agent for Tetris game. To this end, the state is defined as the captured image of the Tetris game board and the reward is designed as a function of cleared lines by the game agent. The action is defined as left, right, rotate, drop, and their finite number of combinations. In addition to this, PER (Prioritized Experience Replay) is employed in order to enhance learning performance. To train the network more than 500000 episodes are used. The game agent employs the trained network to make a decision. The performance of the developed algorithm is validated via not only simulation but also real Tetris robot agent which is made of a camera, two Arduinos, 4 servo motors, and artificial fingers by 3D printing.