DOI QR코드

DOI QR Code

Global Optimization for Energy Efficient Resource Management by Game Based Distributed Learning in Internet of Things

  • Ju, ChunHua (College of Computer Science & Information Engineering, Zhejiang Gongshang University) ;
  • Shao, Qi (College of Computer Science & Information Engineering, Zhejiang Gongshang University)
  • 투고 : 2015.03.22
  • 심사 : 2015.08.02
  • 발행 : 2015.10.31

초록

This paper studies the distributed energy efficient resource management in the Internet of Things (IoT). Wireless communication networks support the IoT without limitation of distance and location, which significantly impels its development. We study the communication channel and energy management in the wireless communication network supported IoT to improve the ability of connection, communication, share and collaboration, by using the game theory and distributed learning algorithm. First, we formulate an energy efficient neighbor collaborative game model and prove that the proposed game is an exact potential game. Second, we design a distributed energy efficient channel selection learning algorithm to obtain the global optimum in a distributed manner. We prove that the proposed algorithm will asymptotically converge to the global optimum with geometric speed. Finally, we make the simulations to verify the theoretic analysis and the performance of proposed algorithm.

키워드

1. Introduction

The Internet of Things (IoT) has drawn much attention from academia as well as industry recently. In 1999, Kevin Ashton proposed the term IoT. With the development of internet and wireless communication, the IoT has been seen as an important technological revolution that brings us into a new ubiquitous connectivity, computing, and communication era [1]. The possible applications of IoT are significant, such as climate monitoring, transport safety, home automation, health care, supply chain, agriculture, rural development, border security and military application. In IoT, the objects (Obs, including people and things) are expected to have the capabilities of sensing the physical objects, communication, and operating smartly without human intervention. The major technologies that would dominate IoT applications include wireless sensor networks (WSN) and mobile communications [2].

Collaboration is an important feature of IoT. The IoT is widely regarded as global network which allows communication and collaboration between people and things, and between things themselves [3]. Wire connected Obs could share information via internet [4], however, much more Obs need wireless connection to overcome limitation of distance and location. The fast wireless communication could support the widely collaboration in the IoT through wireless networks, which significantly impels the development of IoT. Researches on the wireless communication based collaboration in the IoT should be paid much attention to.

In fact, collaboration in IoT needs huge information exchange, which would consume resources, such as communication channel, bandwidth and energy. In practical collaboration via wireless network, the channel resource is very important and limited, then the channel resource management is an important problem. In addition, energy is very sensitive to the IoT [3], especially for the sensors.

There are some researches on the technologies in IoT. Authors in [5-7] focued on the surroundings sensing in IoT based on the sensor network technologies. The identification in IoT was studied in [8-10]. Authors in [11-12] studied the communication protocols. The tracking effects and energy consumption optimization was studied in [13]. In all, to the best of our knowledge, there are only limited researches which focused on the energy optimization in IoT, and channel resource management in the IoT has not been paid attention to.

Considering the importance of energy sensitive channel resource management in the collaboration via wireless network, in this paper, we focus on the wireless channel selection for the Obs, towards the energy efficiency. Due to the dense and dynamically deployment of objects in the IoT [14], the centralized control in network management will be highly inefficient for the high complexity.

The game theory [15] is a powerful tool to study the interactions among multiple users and has been used in the distributed optimization in many researches about the distributed network [16-17]. For some good properties to the distributed global potimization, the potential game was also applied in some researches. In [18], the authors proposed a multi-cell coordination approach to mitigate the mutual interference among base stations in the frequency slotted cellular networks. Authors in [19] proposed local altruistic game and local congestion game which belong to local interaction game in cognitive radio networks. In [20], the authors investigated the problem of joint base station selection and resource allocation in an orthogonal frequency division multiple access heterogeneous cellular network, analyzed this problem by using potential game theoretic approaches, and proposed two different variants of Max-logit learning algorithms which achieved outstanding performances.

For the IoT, there are only few game based researches to our best knowledge. Authors in [14] proposed an energy aware trust derivation scheme using game theoretic approach, which managed overhead while maintaining adequate security of WSNs. In [3], a service providing model was built by using a differential game model. The game solution was gotten in the condition of grand coalition, feedback Nash equilibrium and intermediate coalitions and an allocation policy was obtained by Shapley theory. Nevertheless, with the distributed global energy consumption optimization in IoT which we focus on, related studies are very limited.

To sum up, the multi-Obs energy consumption optimization is an important and challenging problem in IoT, and the distributed optimization approach is potential but not paid much attention to until now. In this paper, we solve the challenging multi-Obs energy efficient channel selection problem in IoT in the distributed optimization approach to achieve global optimum, by using the potential game theory and distributed learning algorithm. First, we formulate an energy efficient neighbor collaborative game model and prove that the proposed game is an exact potential game. Second, we design a distributed energy efficient channel selection learning algorithm to obtain the global optimization in a distributed manner. Thirdly, we analyze the converging speed and the computing complexity of the proposed algorithm. We prove that the proposed algorithm will asymptotically converge to the global optimum with geometric speed. Finally, we make the simulations to verify the theoretic analysis and the performance of proposed algorithm.

The rest of this paper is organized as follows: In Section II, we present the system model and problem formulation. In Section III, we formulate the proposed game model, and investigate its properties. In Section IV, we propose the distributed learning algorithm, analysis the convergence of the algorithm, the converging speed and the computing complexity. In Section V, simulation results and discussion are presented. Finally, we provide conclusions in Section VI.

 

2. System Model and Problem Formulation

We consider a local interactive network [21] consisting of M objects (Obs), which are connected via various wireless communication technologies and the internet. Obs could transmit their data via 3G mobile communication channels, WiFi channels, Bluetooth and so on. Generally speaking, Obs use wireless frequency channels to communicate. Local interactive is a feature of the IoT that the Obs are always spatially distributed and the effect of each Ob’s action is limited between the neighboring Obs [1]. For example, in Fig. 1, the data transmission of Ob2 and Ob3 would collide if they use the same frequency channel for their neighboring relationship. Ob3 and Ob6 could use the same frequency channel without collision because they are not neighbors.

Fig. 1.System model

Importantly, because the qualities of wireless channels are related to the location and channel frequency, choosing different frequency channel means different energy consumption. Denote SCH = {1,2,...,N} as the set of channels, and denote SOb = {1,2,...,M} as the set of Obs. If Obm chooses channel n, according to the Shannon equation, the obtained capacity would be given by:

where Wn is the bandwidth of channel n, N0 is the noise power spectrum density, Pm is the transmitting power of Obm, dm is the distance between Obm and its destination of transmitting data. γn is the path loss exponent of channel n, and θn,m is the instantaneous random component of the path loss [22] on channel n for Obm.

Then the transmission power of Obm should be given by:

Due to the lack of central control, collisions may occur when there may be more than one Obs choose the same channel in local region. The typical slotted Aloha transmission mechanism is adopted to reduce the collision, where each Ob accesses the channel with probability α. The data transmission of Obm on channel n would be successful when no other neighboring Obs attend to access this channel. Denote Ωm as the set of Obm’s neighbors (including Obm itself), and |Ωm| is the number of Obm’s neighbors. The probability of successful transmission is given by:

For each Ob, the higher probability of successful transmission and also the lower transmission power are expected. We define the reward of Ob, e.g., Obm as follow:

In the perspective of network, the optimization objective is to maximize sum of Ob’s utilities:

Remark: From the aspect of each Ob, there is a tradeoff for between channel quality and collision probability when making the decision in the network consist of multi Obs. Good channels, i.e., channels which consume lower power, are more likely to be chosen by other Obs, which reduces to higher probability of collision. From the aspect of network, the optimization of network utility and stability are expected. The global optimization of network utility could be obtained in a centralized manner by the exhaustive search. However, the computation complexity is extremely high. For example, even in a relative small scenario where N=5 and M=30, the number of possible Obs’ strategy profiles is 530 =9.3132 ×1020. The desired method is the one which obtains the global optimization in a distributed manner to low the computation complexity. We solve this problem by using the game theory [15], which is one of the important mathematical tools that analyze interactions among multi objects.

 

3. The Energy Efficient Neighbor Collaborative Game

In this section, we study the distributed optimization of the formulated energy efficiency problem by using game theory. Every Ob is regarded as a player in the game, and we define the energy efficient neighbor collaborative game (EENCG) as:

where SCH is the set of channels, SOb is the set of Obs, Am is the available strategy set of Obm ∈ SOb, where am ∈ Am is the chosen action of Obm. um is the utility function of player Obm, and um(am, a−m) denotes the Obm’s utility where am is the action of player Obm and a−m is the action profile of other players.

Based on the feature of collaboration in the IoT [3], motivated by local collaboration in biographical systems [30], [31] and the collaboration design in networks [18-20], [23-24], [27], we define the following utility function of player, e.g., Obm, to obtain the global optimization in a distributed decision manner.

where φi is the reward of player Obi and Ωm is the set of neighbors of Obm.

Definition 1: Denote an action profile of players as a = {a1,a2,..., aM}, and an action profile is a pure strategy Nash equilibrium (NE) if and only if no player could improve its utility by deviating unilaterally, i.e.,

Theorem 1: the global optimal solution to the proposed energy efficient channel selection problem (5) is a pure strategy NE of the proposed EENCG game.

Proof: The following proof is based on the potential game theory [23]. We define the network utility as the potential function of the EENCG:

When Obm unilaterally changes its action from am to and other Obs hold their original strategies, the value of potential function would be changed:

Note that the mutual influence would not occur when the Obs are not neighbors (Fig. 1). Then for Obi ∈ {SOb╲Ωm}, which is not the neighbor of Obm, there will be no influence from the Obm’s action change, i.e.,

Then we have

The result shows that the change in the potential function equals to the change in individual utility function. Thus, according to the theory in [18], the proposed EENCG game is an exact potential game which has at least one pure strategy NE point, and the potential function is Γ(am, a−m) = φnet.

Furthermore, according to [23], any global or local maxima of the potential function should be a pure strategy NE in exact potential game. Note that, the proposed potential function equals to the network utility, then the global maxima of network utility, i.e., the globally optimal solution to the proposed problem (5) is a pure strategy NE of the proposed EENCG game. Hence, the Theorem is proved.

 

4. Decentralized Energy Efficient Channel Selection Learning Algorithm

According to above theoretic analysis, a method which obtains the best NE of the game is also the method to achieve the global optimal solution of our problem. In this section, we design a distributed energy efficient channel selection learning (DEECSL) algorithm to obtain the global optimization in a distributed manner.

For potential game, several learning algorithms could achieve the NE points, such as best response dynamic [28], no-regret learning [24]. However, these learning algorithms may converge to some suboptimal NE points, not the optimal NE point.

In order to escape from suboptimal NE points and finally converge to the optimal NE, the proposed DEECSL algorithm introduces a probabilistic decision mechanism inspired by Boltzmann exploration strategy [29]. The DEECSL algorithm is shown in Table 1.

Table 1.The proposed DEECSL Algorithm

In step 4, the learning parameter β shows the tradeoff between exploration and exploitation. Smaller β implies that players tend to choose a suboptimal action, whereas higher β implies that players are prone to choose the best response action. Note that the probabilistic decision in step 4 is designed to avoid the learning algorithm converges to some suboptimal NE points.

Theorem 2: The proposed DEECSL algorithm will converges to a unique stationary distribution of players’ strategy profile, and the global optimum would be achieved with an arbitrarily high probability with a sufficiently large β. The unique stationary distribution of players’ strategy profile will be given by:

where Γ(a) is the potential function. A = A1⊗A2⊗...⊗AM is the set of strategy profiles of all the players, where ⊗ is the Cartesian product.

Proof: the following proof is based on the theory in [29] and Theorem 6.2 in [30], which is also used in [18], [19], [23]. Denote the network state at the t-th iteration by S(t) = (S1(t), S2(t),...,Sm(t),...,SM(t)), where Sm(t) means Obm ’s working channel in the t-th iteration. According to the proposed algorithm, S(t + 1) is only determined by S(t), which means that S(t) is a discrete time Markov process with a unique stationary distribution [29].

Denote S(t + 1) = a2, S(t) = a1, and denote the transition probability from state a1 to a2 as Pa1,a2, the transition probability from state a2 to a1 as Pa2,a1. There is one or zero element that may be changed between a2 and a1 in the proposed algorithm (step 3).

When a2 = a1, we have:

When a2 ≠ a1, one player changes its working channel, which reduces that one element of the network state has been changed, e.g., Obm’s working channel from Sm to by adopting action . In step 3, the probability for Obm be has being chosen to update its action is 1 / M. Then we have:

Similarly, suppose Obm’s working channel from Sm to by adopting action , we have:

Because the game has been proved be an exactly potential game, we have:

Based on (18), we have:

Thus,

Considering all the states of the network, we have:

According to the discrete time Markov process theory [29], the proposed algorithm has the unique stationary distribution (14) which satisfies the balanced equation of Markov process.

Furthermore, according to Theorem 1, the global optimal solution to network utility is exactly the best pure strategy NE of the game. Suppose that aopt is the globally optimal action profile of players, according to the design of the potential function, we have

According to Theorem 1, the algorithm converges to a unique stationary distribution When the β → ∞ , exp {βΓ(aopt)} □ exp {βΓ(a)}, ∀a∈{A╲aopt}.

Then the probability of globally optimal solution aopt will be

This result means that the proposed learning algorithm converges to the global optimum with an arbitrarily high probability. Thus, the proof is completed.

Notably, the information exchange in the proposed algorithm is strictly constrained between neighboring players, which makes the cost controlled. The global information is not needed for each player. In other words, the proposed algorithm achieves global optimum with only neighboring information exchange.

Furthermore, we analysis the converging speed and the computing complexity of the proposed algorithm, which are two important factors in the practical application.

Theorem 3: The proposed DEECSL algorithm will asymptotically converge to the global optimum with geometric speed, and the computing complexity would be O(N).

Proof: the following proof is based on the theory and approach in [32], also used in [33]. According to the above Theorem 2, the network state S(t) is a discrete time Markov process with a unique stationary distribution, here denoted by Sπ. Denote all the possible states of this Markov process as set ΞS.

Denote the initial state as SI in the Initialization step of the proposed algorithm. Then the network state would transmit from SI to Sπ according to process of learning and updating in the algorithm. Denote the transition matrix of the Markov process as □ , which depict the transition probabilities between the possible network states.

According to [32], define the distance between two states S1, S2 as d(S1,S2), which measures the variation between S1 and S2. Then the distance between SI and Sπ would be d(SI, Sπ). Construct a matrix ℵ = (SI, Sπ)T, and define ℑ = (SIT□, SπT□)T.

The ergodic coefficient [32] is defined as follow:

According to the property of ergodic coefficient [32], we have:

Each updating step in the learning algorithm would cause a state transition according to the transition matrix. Suppose the network state is SI□k after the k -th state transition after the k -th learning step. Then the distance between the k -th state to the stationary state would be:

Furthermore, with the δ(□)k, according to the definition of ergodic coefficient [27], we have:

where □ij is the probability from state Si to Sj.

With the state transition in the proposed algorithm, according to the step 4, the network state transition happens when the selected updating user changes its strategy based on equation (13). Suppose the updating player is Obm, its current strategy is am and the corresponding current network state is Si. Denote the possible next strategy is and the corresponding current network state is Sj. According to the mixed strategy (13), the network state transition probability from state Si to Sj would be:

where |Am| is the number of possible strategies for Obm.

Define which is biggest gap of utility between all possible two states in the Markov process ΞS. According to the utility function, is a finited constant value and ≥ 0 . Then we have:

As a result, we have:

Further more:

This means that the algorithm converges from the initial state to the network stationary state geometric speed.

With the computing complexity, in the proposed algorithm, the computing is mainly on the strategy selection, i.e., for the seleceted strategy updating player, Obm, the probabilities for possible strategies need to be computed by:

There are |Am| ≤ |SCH| = N possible strategies for Obm to be computed. Then the computing complexity would be O(N). Thus, the proof is completed.

It should be noted that geometric converging speed should be acceptable for most applications, and the computing complexity is much smaller than the central control based exhausive searching approach, whose computing complexity would be O(NM).

 

5. Numeric Results and Discussion

In this section, we evaluate the performance of the proposed algorithm by Matlab simulations. Without loss of generality, the following parameters are set: the number of channels is 5. Similar with [31], we set the path loss exponent as γ = 2, and the noise power as N0 = −130 dB. The transmission distance for Obs is the unit distance for simplicity. The transmission data rate is 1 Mbit/s. The instantaneous random components are assumed to be unit-constant and the channels are assumed to undergo Rayleigh fading with unit mean. The neighboring relationship of Ob s is randomly generated. The bandwidth of channels are set as [1, 1.5, 2, 2.5, 3] MHz to simulate the different channels. The results are obtained by simulating 3000 independent experiments and then taking the average value.

Table 2.Simulation Parameters

The simulation is to show the performance of the proposed DEECSL algorithm compared with other approach, and verify the theoristic analysis of proposed algorithm, i.e., the convergence and the optimality. We first verify the convergence and the optimality of the proposed algorithm in a relative small scenario. Then we show the performance comparison furthermore in a relative large scenario to verify the adaption of the proposed algorithm.

To show the convergence and the optimality of the proposed algorithm, we compare the performance of the proposed algorithm with the globally optimal solution obtained by exhaustive search. We simulate 15 players scenario to avoid the limitation of computing capacity with the exhaustive search method. Fig. 2 shows that the proposed DEECSL algorithm converges to the globally optimal network utility after some learning iterations. This result verifies the theoristic analysis on the proposed approach.

Fig. 2.The convergence and optimality of the proposed DEECSL algorithm, M=15.

To show the advantage of the collaboration design in the proposed DEECSL algorithm, we also show the performance of the traditional non-collaboration approach, which is widely adopted in game theoristic learning algorithms. In the non-collaboration approach, each player does not care other neighboring players’ benefits and make the decision totally based on its own reward. The result shows that the proposed collaboration design DEECSL algorithm outperforms the non-collaboration approach obviously. The main reason is that players in the non-collaboration approach would not consider the golobal reward, that also results in the unstability of non-collaboration approach. This comparison result verifies necessity and the advantage of the collaboration based utility function design in the proposed game model. In other words, with out the utility function design, the system would not converge to stable state and the performance would be worse.

In order to show the advangate of the design of probabilistic decision in the proposed algorithm, we compare with the best response (BR) algorithm [28]. Fig. 2 also shows that the proposed DEECSL algorithm obtains better performance compared with the BR algorithm. Here the BR algorithm is also based on the collaboration approach. The BR algorithm is a typical learning algorithm for games to achieve the NE points. The differenc between the BR algorithm and the proposed DEECSL algorithm is that players in BR algorithm choose the best strategy from all the candidate strategies rather than the probabilistic decision according to equation (13). Due to the “choosing the best” strategy, the BR algorithm’s performance increases faster in the beginning and converges earlier. However, because players choose the current best strategies absolutely, they might loose some potential strategies in the long term. As a result, the BR learning algorithm may converges to some local optimum, which reduces the converged performance.

Compared with the BR algorithm, the proposed algorithm could escape from local optimum to achieve globally optimum due to the design of probabilistic decision. In other words, the probabilistic decision based on equation (13) rather than the “absolutely choosing the current best” would bring the chance for the player to make the right decision in the long term.

To verify the proposed algorithm furthermore, we simulate a larger network scale where the number of players is 50. Due to the limitation of computation capability, the globally optimal solution by the exhaustive search is not given out. We compare the proposed algorithm with the BR learning algorithm and the non-collaboration approach. Fig. 3 shows the convergence of the proposed DEECSL algorithm again. Also, the proposed DEECSL algorithm achieves better performance than the BR algorithm. The non-collaboration approach achieves worst performance and unstability for the lack of collaboration. This result verifies the theoristic analysis of the proposed algorithm and the performance advantage compared with other approaches again. That means the proposed approach is widely applicative in different scenarios.

Fig. 3.The proposed DEECSL algorithm vs the BR learning algorithm, M=50.

 

6. Conclusion

In this paper, we studied the distributed energy efficient resource management in the Internet of Things. We proposed an energy efficient neighbor collaborative game model and proved that the proposed game was an exact potential game. Then, to achieve the global optimization to the proposed energy efficient channel selection problem in a distributed manner, we designed a distributed energy efficient channel selection learning algorithm to obtain the global optimization with only neighboring information exchange. We proved that the proposed DEECSL algorithm will asymptotically converge to the global optimum with geometric speed. Simulation results verified the theoretic analysis and the performance of proposed algorithm.

피인용 문헌

  1. A survey on game theoretical methods in Human–Machine Networks vol.92, pp.None, 2015, https://doi.org/10.1016/j.future.2017.10.051