DOI QR코드

DOI QR Code

FCBAFL: An Energy-Conserving Federated Learning Approach in Industrial Internet of Things

  • Bin Qiu (School of Computer Science and Engineering, Guilin University of technology) ;
  • Duan Li (School of Computer Science and Engineering, Guilin University of technology) ;
  • Xian Li (School of Electronics and Information Engineering, Shenzhen University) ;
  • Hailin Xiao (School of Computer Science and Information Engineering, Hubei University)
  • Received : 2024.04.01
  • Accepted : 2024.08.18
  • Published : 2024.09.30

Abstract

Federated learning (FL) has been proposed as an emerging distributed machine learning framework, which lowers the risk of privacy leakage by training models without uploading original data. Therefore, it has been widely utilized in the Industrial Internet of Things (IIoT). Despite this, FL still faces challenges including the non-independent identically distributed (Non-IID) data and heterogeneity of devices, which may cause difficulties in model convergence. To address these issues, a local surrogate function is initially constructed for each device to ensure a smooth decline in global loss. Subsequently, aiming to minimize the system energy consumption, an FL approach for joint CPU frequency control and bandwidth allocation, called FCBAFL is proposed. Specifically, the maximum delay of a single round is first treated as a uniform delay constraint, and a limited-memory Broyden-Fletcher-Goldfarb-Shanno bounded (L-BFGS-B) algorithm is employed to find the optimal bandwidth allocation with a fixed CPU frequency. Following that, the result is utilized to derive the optimal CPU frequency. Numerical simulation results show that the proposed FCBAFL algorithm exhibits more excellent convergence compared with baseline algorithm, and outperforms other schemes in declining the energy consumption.

Keywords

1. Introduction

Traditional industrial manufacturing tends to become more and more intelligent and informatized due to the swift progress of both the internet of things (IoT) and artificial intelligence (AI) [1-2]. As a result, the Industrial Internet of Things (IIoT) has emerged. There are billions of diverse industrial devices connected to the edge of the network [3]. Moreover, a large number of industrial tasks rely on real-time monitoring and decision support, whereby various sensors and controllers generate massive amounts of data to support them [4-5]. Since these data usually contain private information [6], it is essential to protect such important information from leakage during data processing. For traditional centralized machine learning (ML) approach, it may result in privacy leakage because it collects all the data into a data center for uniform handling [7-8]. For example, some IIoT applications involve location information of transportation vehicles, and unauthorized access or leakage of such location data may pose a threat to the security and privacy of supply chains [9]. Moreover, massive industrial equipment data need to be transmitted over wireless network, which incurs unacceptable communication traffic [10].

As a distributed ML paradigm, FL is expected to alleviate concerns about data privacy leakage and huge overhead posed by centralized ML [11]. This technology deploys ML models on multiple devices [12-13] and allows them to cooperatively construct a shared learning model as well as maintaining all training data on edge devices [14]. FL only needs to send the model parameters to the data center, which significantly lessens cost while well protecting data privacy. FL has been quickly applied and developed [15-24]. On one hand, the advances in memory elements have provided favorable conditions for large amounts of data collecting and storing. On the other hand, these devices are typically equipped with high-performance chips, which makes it easy to complete the model training tasks in FL.

Despite the above advantages of FL, there are a couple of challenges that need to tackle. In practice, the distribution of data on devices is usually non-independent identically distributed (Non-IID) data [15]. Many researchers have devoted a lot of work to address this [16-18], an example is the widely used FedAvg utilizes a mean stochastic gradient descent (SGD) strategy [16], which shows excellent performance on heterogeneous data, but still has limitations due to the lack of convergence analysis. Based on the derived upper bound on the expected weight divergence, Zhao et al. [17] introduced a federated averaging approach aiming at mitigating the distribution divergence presented in Non-IID data. You et al. [18] proposed the federated gradient scheduling, an enhanced method utilizing historical gradient sampling. It gathers and samples user gradients to solve the Non-IID problem, and also utilizes differential privacy techniques to enhance privacy protection. Nevertheless, the above methods proposed for the Non-IID problem ignore the massive cost in FL.

In reality, FL requires intense exchanges of model updates between the server and clients during model training [19]. Unfortunately, in recent years, as deep learning models are getting increasingly huge, which often contain up to billions or even trillions of learnable parameters, thus incurring huge communication and computing overheads [20-21]. Motivated by this, many researchers have devoted their research work to energy conservation in wireless FL networks [22-24]. Alishahi et al. [22] adopted a low-complexity bisection algorithm and jointly considered the various resources of the devices for minimizing the overall energy consumption of the wireless FL network. Kim et al. [23] proposed a joint dataset and computation management scheme by integrating learning efficiency and global energy consumption. For the sake of lowering the cost of communication caused by frequent model interactions in FL, Malan et al. [24] proposed a novel approach of FL and gradual tier freezing, which reduces the transmission cost while guaranteeing the performance of training. However, most of them ignore the differences in the performance of various devices.

To solve the above problems for FL in IIoT system, a novel solution is proposed in this article. Aiming at the heterogeneous data, we assume that the loss function is smooth and strongly convex for theoretical analysis. Considering local computation and model transmission, we start with investigating the computation delay and energy consumption of heterogeneous devices. Then, the ultimate objective of this article is defined as an energy minimization problem under delay constraint. Owing to the nonconvexity of the original problem, we decompose it into two simple sub-problems, which are described as the CPU frequency control subproblem and the bandwidth allocation subproblem, respectively. Moreover, we devise an iterative method to solve the problem. In the end, the performance of the proposed scheme is evaluated by a wide range of experiments. In summary, the contributions of this article are mainly in the following aspects:

(1) We propose a novel FL algorithm to address the challenge for the Non-IID data and heterogeneity of devices in IIoT scenario. Specifically, a local proxy function is initially constructed for each device to ensure a smooth decline in global loss, and a hyperparameter η is introduced to trade-off between local and global gradient estimation, and the linear convergence of the method is proved theoretically.

(2) To cope with the high energy consumption problem generated by model training in FL, a joint CPU frequency control and bandwidth allocation approach, called FCBAFL is proposed. Considering the synchronous communication and limited bandwidth resource, the ultimate goal of this article is characterized as minimizing energy consumption under delay constraint. Due to the nonconvexity of the original problem, the proposed scheme splits it into two subproblems to solve.

(3) Extensive numerical simulations show that FCBAFL has higher and stable convergence compared with baseline algorithm on unbalanced MNIST dataset. Moreover, the proposed algorithm can decrease the energy consumption of IIoT system to a certain extent while satisfying various delay constraints.

The remainder of the article is organized as follows. Section 2 presents the system model and problem formulation. Section 3 provides the solution to the problem. Section 4 offers experimental numerical results that demonstrate the superiority of the proposed method. Section 5 summarizes the work of this article.

2. System Model

We consider a framework of FL system in an IIoT scenario, which includes a central server and multiple edge devices, as shown in Fig. 1. The central server is located in the center of the region and provides basic model communication and aggregation services. The set of edge devices is defined as Ν = {1, 2,..., N}, where N denotes the number of edge devices. The edge devices generate local models by local training, the dataset produced on device n is expressed as Dn = {(xi, yi)}dni=1, where xi is sample i-th input sample and yi denotes its output label. And dn represents the data size of Dn. Therefore, the local loss function on device n 's dataset can be expressed as

\(\begin{align}F_{n}(\omega)=\frac{\sum_{\left(x_{i}, y_{i}\right) \in D_{n}} f_{n}\left(\omega, x_{i}, y_{i}\right)}{d_{n}}\end{align}\)       (1)

Fig. 1. System model.

where ω and fn(ω, xi, yi) represent the model parameter and loss function, respectively. The purpose of FL is to find an optimal model parameter ω* ∈ Rd to minimize the global loss function, denoted as

\(\begin{align}\omega^{*}=\underset{\omega}{\arg \min } F(\omega) \stackrel{\Delta}{=} \underset{\omega}{\arg \min } \frac{\sum_{n \in \mathrm{~N}} d_{n} F_{n}(\omega)}{\sum_{n \in \mathrm{~N}} d_{n}}\end{align}\)       (2)

2.1 FL Process

To facilitate the analysis, this article assumes that local loss function is smooth and strongly convex.

Assumption 1: Fn(⋅) is L-smooth, ∀ω, ω' ∈ Rd

\(\begin{align}F_{n}(\omega) \leq F_{n}\left(\omega^{\prime}\right)+<\nabla F_{n}\left(\omega^{\prime}\right), \omega-\omega^{\prime}>+\frac{L}{2}\left\|\omega-\omega^{\prime}\right\|^{2}\end{align}\)       (3)

Assumption 2: Fn(⋅) is μ-strong convex (since the Hessian of Fn(⋅) can be positive semi-definite), ∀ω,ω'∈ Rd

\(\begin{align}F_{n}(\omega) \geq F_{n}\left(\omega^{\prime}\right)+<\nabla F_{n}\left(\omega^{\prime}\right), \omega-\omega^{\prime}>+\frac{\mu}{2}\left\|\omega-\omega^{\prime}\right\|^{2}\end{align}\)       (4)

The assumptions are similar to the l2-regularized linear regression model \(\begin{align}f_{n}(\omega)=\frac{1}{2}\left(\left\langle x_{i}, \omega\right\rangle-y_{i}\right)^{2}+\frac{\mu}{2}\|\omega\|^{2}\end{align}\), and let ρ = L/µ be the condition number of Fn(⋅)'s Hessian matrix. In each round of iteration, central server interacts with the edge devices in the following process.

2.1.1 Local model updates

Device n first receives the global model parameter and gradient information (which are given in (10) and (11) below, respectively) from the previous global round to minimize the proxy loss function, which is denoted as

\(\begin{align}\min L_{n}^{t}(\omega)=F_{n}(\omega)+<\eta \nabla \bar{F}^{t-1}-\nabla F_{n}\left(\omega^{t-1}\right), \omega>\end{align}\)       (5)

Instead of ∇F(ωt-1), we utilize \(\begin{align}\nabla \bar{F}^{-t-1}\end{align}\) as the estimate of the global gradient, due to the fact that the latter can be obtained from the edge device information via the server, while the former is unrealistic. Moreover, we have

\(\begin{align}\nabla L_{n}^{t}(\omega)=\nabla F_{n}(\omega)+\eta \nabla \nabla \bar{F}^{t-1}-\nabla F_{n}\left(\omega^{t-1}\right)\end{align}\)       (6)

where a variable hyperparameter η is introduced to achieve weighted estimation of local and global gradient. The key of this algorithm is that each device can solve (5) and obtain an approximate solution ωtn that satisfies

||∇Ltntn)||≤θ||∇Ltnt−1)||, ∀n       (7)

Here, θ ∈ [0,1] denotes the local accuracy. Note that Ltn(ω) is also L-smooth and μ-strongly convex because it has the same Hessian matrix as Fn(⋅). Therefore, the objective in (5) can be solved by gradient descent algorithm. It has been shown that this can achieve linear convergence [25], as follows

Ltn(xk) - Ltn(x*) ≤ c(1 − h)k(Ltn(x0) - Ltn(x*))       (8)

where xk and x* represent the model of the k-th local iteration and optimal solution of (5), respectively. Both c and h are constants, whose values depend on ρ. According to [14], the problem in (5) can be solved eventually while satisfying (7) in Ql rounds, which is represented as

\(\begin{align}Q_{l}=\frac{2}{h} \log \frac{c \rho}{\theta}\end{align}\)       (9)

2.1.2 Global model updates

The local model ωtn and gradient ∇Fntn) at each edge device are sent to the central server for aggregating a new global model and gradient, which are denoted respectively as

\(\begin{align}\omega^{t}=\frac{\sum_{n \in \mathrm{~N}} d_{n} \omega_{n}^{t}}{\sum_{n \in \mathrm{~N}} d_{n}}\end{align}\)       (10)

\(\begin{align}\nabla F^{t}=\frac{\sum_{n \in \mathrm{~N}} d_{n} \nabla F_{n}\left(\omega_{n}^{t}\right)}{\sum_{n \in \mathrm{~N}} d_{n}}\end{align}\)       (11)

The central server then broadcasts them to all devices. This process is repeated in Qg rounds until the global model in (1) can reach convergence for an arbitrary small constant ε > 0 and when (12) is satisfied

F(ωt) - F(ω*) ≤ ε, ∀t ≥ Qg       (12)

where ω* is the optimal solution to (1). And the number of global rounds required for the global model to reach this convergence condition is

\(\begin{align}Q_{g}=\frac{1}{\theta} \log \frac{F\left(\omega^{0}\right)-F\left(\omega^{*}\right)}{\varepsilon}\end{align}\)       (13)

the proof of this conclusion can also be found in [14].

2.2 Computation Model and Communicaton Model

In the local model updating phase, the computation delay and energy consumption generated by a single local training of device n are respectively

\(\begin{align}T_{n}^{c p}=\frac{c_{n} d_{n}}{f_{n}}\end{align}\)       (14)

Ecpn = cndnf2nσ       (15)

where cn and fn denote the number of CPU cycles required for device n to compute a data unit and its operating frequency, respectively. Let σ denote the effective capacitance coefficient of the devices' computation module [26].

The communication process includes two stages, uplink and downlink, for local model uploading and global model broadcasting, respectively. Since the power and bandwidth of central server are much larger than that of the devices, the downlink broadcasting delay can be ignored compared with the uplink transmission delay. All devices upload models based on the orthogonal frequency division multiple access protocol after completing local training. According to Shannon formula, the data transmission rate of device n is denoted as

\(\begin{align}R_{n}=b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)\end{align}\)       (16)

where B represents the total bandwidth of the central server, bn ∈ (0,1) is the proportion of bandwidth allocated by the central server for device n in each round of communication. pn and hn denote the transmission power and channel gain of device n, respectively. And N0 is the noise power spectral density. Then, the transmission delay and energy consumption incurred by device n during a communication round are respectively

\(\begin{align}T_{n}^{c o}=\frac{S_{n}}{R_{n}}\end{align}\)       (17)

\(\begin{align}E_{n}^{c o}=p_{n} T_{n}^{c o}=\frac{p_{n} S_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}\end{align}\)       (18)

where sn represents the data size of the model parameter ωn and gradient ∇Fnn) uploaded by device n.

2.3 Unified Model

Note that the global model can only be updated after all local models have been received, which can be presented as

\(\begin{align}\begin{aligned} T_{g} & =\max _{n \in \mathrm{~N}}\left(T_{n}^{c o}+Q_{l} T_{n}^{c p}\right) \\ & =\max _{n \in \mathrm{~N}}\left\{\frac{s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+\frac{Q_{l} c_{n} d_{n}}{f_{n}}\right\}\end{aligned}\end{align}\)       (19)

We should note that Tg must satisfy the QoS requirement QT, which is expressed as

\(\begin{align}\max _{n \in \mathrm{~N}}\left\{\frac{s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+\frac{Q_{l} c_{n} d_{n}}{f_{n}}\right\} \leq Q_{T}\end{align}\)       (20)

and it can be transformed into

\(\begin{align}\frac{s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+\frac{Q_{l} c_{n} d_{n}}{f_{n}} \leq Q_{T}, \forall n \in \mathrm{~N}\end{align}\)       (21)

Similarly, the energy consumption of all devices in a single FL round can be calculated as

\(\begin{align}\begin{aligned} E_{g} & =\sum_{n \in \mathbb{N}}\left(E_{n}^{c o}+Q_{l} E_{n}^{c p}\right) \\ & =\sum_{n \in \mathrm{~N}}\left[\frac{p_{n} s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+Q_{l} c_{n} d_{n} f_{n}^{2} \sigma\right]\end{aligned}\end{align}\)       (22)

To facilitate the description of the problem below, let b = {b1, b2, ..., bN} denote the set of bandwidth proportion allocated by the center server for all devices in a round of FL communication. And let f = {f1, f2, ..., fN} denote the set of CPU frequencies of all devices.

2.4 Problem Formulation

Apparently, the goal of minimizing system latency and energy consumption cannot be satisfied at the same time. For example, to decrease the training delay, edge device needs to operate at high frequency, which also leads to higher energy consumption. Thus, the problem we will address is minimizing the energy consumption at the same time meeting the delay constraint, which is given by

\(\begin{align}\boldsymbol{P}_{0}: \min _{b, f} \sum_{n \in \mathrm{~N}}\left[\frac{p_{n} s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+Q_{l} c_{n} d_{n} f_{n}^{2} \sigma\right]\end{align}\)       (23)

s.t. bn ∈ (0,1), ∀n∈Ν       (23a)

fminn ≤ fn ≤ fmaxn, ∀n∈Ν       (23b)

pminn ≤ pn ≤ pmaxn, ∀n∈Ν       (23c)

\(\begin{align}\sum_{n \in \mathbb{N}} b_{n}=1\end{align}\)       (23d)

\(\begin{align}\frac{s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}+\frac{Q_{l} c_{n} d_{n}}{f_{n}} \leq Q_{T}, \forall n \in \mathrm{~N}\end{align}\)       (23e)

where constraint (23a) represents the proportion of bandwidth allocated by the central server for the device n in one round of communication. The ranges of CPU frequency and transmission power of each device are indicated by constraints (23b) and (23c), respectively. Constraint (23d) represents that the bandwidth of the central server is fully exploited in each round of communication. Constraint (23e) indicates the QoS requirement, i.e., delay constraint.

3. Solutions to Problem

In this section, we present an FL approach based on CPU frequency control and bandwidth allocation aiming at the issue of minimizing energy consumption under delay constraint.

Apparently, P0 is challenging because the variables bn and fn are coupled with each other in (23e). Thus, we decompose it into two subproblems: the energy consumption minimization for global model aggregation and local model training, which are respectively expressed as

\(\begin{align}\boldsymbol{P}_{1}: \min _{b} \sum_{n \in \mathrm{~N}} \frac{p_{n} s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n} h_{n}}{b_{n} B N_{0}}\right)}\end{align}\)       (24)

s.t. (23a), (23c), (23d), (23e)       (24a)

\(\begin{align}\boldsymbol{P}_{2}: \min _{f} \sum_{n \in \mathbb{N}} Q_{l} c_{n} d_{n} f_{n}^{2} \sigma\end{align}\)       (25)

s.t. (23b), (23e)       (25a)

For simplicity of the problem, the transmit power pn of each device takes a uniform constant value p*n. And the size of model parameters sn is constant. Obviously, P0 is a problem of bandwidth allocation under multiple bounds by fixing the CPU frequency in P0, which can be transformed into

\(\begin{align}\boldsymbol{P}_{1}^{\prime}: \min _{\boldsymbol{b}} \sum_{n \in \mathbb{N}} \frac{p_{n}^{*} s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n}^{*} h_{n}}{b_{n} B N_{0}}\right)}\end{align}\)       (26)

\(\begin{align}\frac{s_{n}}{b_{n} B \log _{2}\left(1+\frac{p_{n}^{*} h_{n}}{b_{n} B N_{0}}\right)}+\frac{Q_{l} c_{n} d_{n}}{f_{n}^{*}} \leq Q_{T}\end{align}\)       (26a)

Then, a bounded storage quasi-newton approach called L-BFGS-B is utilized to solve it [27]. It is the extension of the L-BFGS algorithm, where the scale matrix only stores the information of the most recent iterations, and the stored information of the matrix is updated after an iteration is completed, which greatly reduces the computation memory. However, it is only suitable for solving unconstrained optimization problems. Targeting this limitation, L-BFGS-B employs strategies such as backtracking and line search with a limited maximum step size to solve constrained optimization problems. Thus, it has the advantages of reasonable memory requirement, small iteration cost and fast computation.

For the convenience of description, we transform P'1 into a function with b as the independent variable, denoted by f(b). The specific flow of the L-BGFS-B algorithm is as follows:

step 1: Set the initial value b0 and integer m that determines the number of finite memory correction stores, define the initial finite memory matrix and let k = 0.

step 2: Calculate the gradient using the chain derivation rule.

\(\begin{align}\nabla f(\boldsymbol{b})=\sum_{n \in \mathrm{~N}} \frac{\partial f}{\partial b_{n}}\end{align}\)       (27)

step 3: Calculate the search direction by the direct method.

pk = -(Hk∇f(bk))       (28)

where Hk denotes the Hessian matrix of the function in the k-th iteration.

step 4: A line search is performed along the direction of pk, the step mediation factor α is computed, and the parameter is updated to find the minimum of the function.

bk+1 = bk + αkpk       (29)

step 5: Update the Hk and check for convergence.

The optimal solution b* of P'1 can be obtained by repeating the above steps until one of the following three conditions is met:

(i). The maximum number of iterations is reached.

(ii). The reduction of the objective function becomes smaller.

(iii): The paradigm of the projected gradient is sufficiently small.

Similarly, according to (31a), P2 can be converted into

\(\begin{align}\boldsymbol{P}_{2}^{\prime}: \min _{f} \sum_{n \in \mathrm{~N}} Q_{l} c_{n} d_{n} f_{n}^{2} \sigma\end{align}\)       (30)

max{fminn, f*n} ≤ fn ≤ fmaxn, ∀n∈Ν       (30a)

where

\(\begin{align}f_{n}^{*}=\frac{Q_{l} c_{n} d_{n}}{Q_{T}-\frac{s_{n}}{b_{n}^{*} B \log _{2}\left(1+\frac{p_{n}^{*} h_{n}}{b_{n}^{*} B N_{0}}\right)}}\end{align}\)       (31)

here, \(\begin{align}f_{n}=\max \left\{f_{n}^{\min }, \tilde{f}_{n}\right\}\end{align}\) is known as the closed-form solution of P'2, which is referred to as the optimal CPU frequency in this article. Its proof is simple: since fn is always positive and the objective function in P'2 is monotonically increasing with respect to fn. Hence, fn should be the minimum in its feasible domain (30a). In conjunction with the previous section, the detailed flow of the FCBAFL algorithm is shown in Algorithm 1.

Algorithm 1 FCBAFL

#####

4. Simulation to Results

In the simulation experiments, we implement the FCBAFL algorithm in PyTorch. The IIoT scenario considered in this article contains a central server and N = 100. The bandwidth B of the central server is 1MHz. Each device’s channel gain is generated as Rayleigh fading and the path-loss model is 128.1+37.6log10d[km] [28]. The noise power spectral density is -174dBm/MHz. The transmit power pn of each device is a uniform constant, i.e., p*n =0.5w, and the effective capacitance coefficient σ =10-23w. The amount of data on each device dn follows a uniform distribution of [5,10] MB, and they are randomly divided, with 75% for training and 25% for testing. The model parameters and gradient size sn uploaded by each device are about 5×105 bits. cn is the number of CPU cycles needed by the device to compute a unit of data, follows a uniform distribution of [10, 30] cycles/bit, and fn is the CPU frequency of device, which follows a uniform distribution of [1, 2] GHz. In the following experiments, we first evaluate the FCBAFL proposed in this paper in comparison with the classical FedAvg [16] algorithm based on MNIST dataset [29].

4.1 FL Performance Comparison

Fig. 2 and Fig. 3 validate the effect of different parameters on the testing accuracy, both algorithms use uniform values for the number of local iterations Ql = 20 and the number of global communication rounds Qg = 500. Fig. 2 demonstrates the effect of different batchsize and η on the convergence speed of both algorithms, while Fig. 3 further validates the effect of η on FCBAFL by fixing the batchsize=∞.

Fig. 2. Comparison of testing accuracy.

Fig. 3. Comparison of testing accuracy.

In Fig. 2, we first analyze the effect of batchsize on the convergence performance by observing two dashed lines and two solid lines, respectively. It can be seen that the smaller the batchsize is, the faster the convergence of both algorithms will be. This is because when the amount of data is constant, training with a smaller batchsize requires more frequent model updates to adapt to changes in the training data more quickly. And when the batchsize is fixed, e.g., batchsize=20 or 40, FCBAFL has better performance than FedAvg. This is because FCBAFL introduces the hyperparameter η, which enables a better trade-off between the local gradient and the global gradient. We also found that an increase in the value of η also makes FCBAFL converge faster, this is because a larger η implies that the update of the local gradient is closer to the global gradient, and equation (6) illustrates this phenomenon well.

In Fig. 3, both methods use full-batch training, i.e., fixing batchsize=∞ again verifies the effect of the hyperparameter η on FCBAFL, especially when η takes a smaller value such as 0.2, FCBAFL does not perform as well as FedAvg, but we can still improve the situation by increasing the value of η . We also notice another phenomenon, compared with Fig. 2, the four curves in Fig. 3 oscillate significantly less. This is because the small number of data samples in the small batch training leads to noisy gradient estimation, and the local gradient is prone to deviate from the global gradient thus causing instability in the training process. While the full-batch training has more data samples in each batch thus the training process is smoother.

Fig. 4 and Fig. 5 depict the effect of different parameters on the training loss of both FL algorithms when using the same comparison settings as Fig. 2 and Fig. 3. Similarly, Fig. 4 demonstrates the effect of different batchsize and η on the convergence speed of both algorithms, while Fig. 5 further validates the effect of η on FCBAFL by fixing the batchsize=∞.

Fig. 4. Comparison of training loss.

Fig. 5. Comparison of training loss.

Fig. 4 and Fig. 5 present the same results as Fig. 2 and Fig. 3. The introduction of the hyperparameter η leads to a better convergence performance of FCBAFL than FedAvg.

4.2 Energy Consumption Comparison

We further demonstrate the advantages of the FCBAFL method proposed in reducing resource consumption in Fig. 6-Fig. 8. Fig. 6 compares the energy consumption generated by several frequency control strategies under different delay constraints when the number of devices is constant. Fig. 7 depicts the variation of energy consumption produced by various frequency control strategies with the number of devices. Fig. 8 verifies the variation of energy consumption produced by different bandwidth allocation strategies with the number of devices.

Fig. 6. Comparison of energy consumption generated by various CPU frequencies under different delay constraints.

Fig. 7. Comparison of energy consumption generated by various CPU frequencies under different number of devices.

Fig. 8. Comparison of energy consumption generated by various bandwidth allocation strategies under different number of devices.

Since FCBAFL combines frequency control and bandwidth allocation, both the optimal frequency (OF) strategy and the optimal bandwidth allocation (OA) strategy in the following experiments aim at verifying the advantages of FCBAFL in reducing energy consumption. In this regard, we perform three experiments.

(1). When the number of devices is fixed, adopting the strategy of average bandwidth allocation, all devices train at different CPU frequencies under different delay constraints.

The three CPU frequencies are as follows:

(a). Random Frequency (RF): All devices take random values within the feasible range of CPU frequencies, i.e., fn∈[fminn, fmaxn], ∀n∈Ν.

(b). Average Frequency (AF): All devices perform the training task at an average frequency, i.e., fn∈[fminn + fmaxn)/2, ∀n∈Ν.

(c). Optimal Frequency (OF): All devices use the optimal CPU frequency according to equation (31), i.e., fn = max{fminn, f*n}, ∀n∈Ν.

Fig. 6 reflects the impact of various CPU frequencies on the energy consumption under different delay constraints. It can be seen that RF is inferior to AF in most cases due to its randomness and OF always produces lower energy consumption. In detail, OF reduces the total energy consumption by about 18.3% and 17.9% compared with RF and AF, respectively.

(2). Adopt the strategy of average bandwidth allocation, all devices train at different CPU frequencies under different number of devices.

Fig. 7 depicts the effect of various CPU frequencies on the energy consumption under different number of devices. OF still performs best in all three strategies. The other two frequencies produce almost the same energy consumption, which is due to the fact that in practical circumstances, the communication energy consumption is usually greater compared with the training energy consumption. The strategy of average allocation is applied uniformly, so the difference among three schemes is very small. However, compared with RF and AF, OF still reduces the total energy consumption by 10.1% and 7.8%, respectively.

(3). All devices train at the optimal CPU frequency, different bandwidth allocation strategies are adopted under different numbers of devices.

The four bandwidth allocation strategies are as follows:

(a). Random Allocation (RA): The central server randomly allocates bandwidth to the devices, i.e., bn ∈ (0,1), ∀n∈Ν.

(b). Average Allocation (AA): The central server allocates the same bandwidth to all devices, i.e., bn=1/N, ∀n∈Ν.

(c). SNR-Based Allocation (SA) [30]: The central server allocates bandwidth according to the signal-to-noise-ratio(SNR) of device, i.e., \(\begin{align}b_{n}=\left(p_{n} h_{n} / N_{0}\right) / \sum_{n \in \mathbb{N}}\left(p_{n} h_{n} / N_{0}\right)\end{align}\), ∀n∈Ν.

(d). Optimal Allocation (OA): The optimal solution b* which is obtained by the L-BFGS-B algorithm.

Fig. 8 illustrates the impact of different bandwidth allocation policies on the energy consumption under different number of devices. All four curves are trending upward. However, RA strategy does not consider the device’s performance and channel condition at all, resulting in low resource utilization and thus increases energy consumption. AA strategy allocates the bandwidth equally, which may lead to overuse of resource by some devices and lack of resource by others. SA strategy allocates bandwidth according to the device’s channel condition, but it does not consider the local training process, so the overall energy consumption is still very high. OA strategy optimizes the objective function by considering the delay constraint generated by local training and model transmission, thus achieving the optimal allocation and generating the lowest energy consumption [31]. Compared with RA, AA and SA, OA strategy reduces about 18.9%, 7.9% and 2.6% of the total energy consumption, respectively.

5. Conclusion

In this article, we propose an iterative FL approach for the wireless IIoT scenario, which is called FCBAFL. Its excellent convergence is certified theoretically and analytically by designing a local proxy function for each edge device to solve the local problem. Next, we present an objective of minimizing energy consumption while satisfying delay constraint based on the consideration of synchronous communication. And we describe it as a joint optimization problem of CPU frequency control and bandwidth allocation, which is subsequently split into two simple subproblems to be solved separately. Simulation results demonstrate that FCBAFL performs better convergence than baseline algorithm and reduces the energy consumption to a certain extent.

Acknowledgements

This work was supported by the Guangxi Key Research and Development Program under Grants AB23075175 and AB23026034, National Natural Science Foundation of China under Grant 62341117, Guangxi Natural Science Foundation under Grant 2024GXNSFAA010458, and the Innovation Project of Guangxi Graduate Education under Grant YCSW2024362.

References

  1. Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, "Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing," Proceedings of the IEEE, vol.107, no.8, pp.1738-1762, Aug. 2019.
  2. S. K. Jagatheesaperumal, M. Rahouti, K. Ahmad, A. Al-Fuqaha, and M. Guizani, "The Duo of Artificial Intelligence and Big Data for Industry 4.0: Applications, Techniques, Challenges, and Future Research Directions," IEEE Internet of Things Journal, vol.9, no.15, pp.12861-12885, Aug. 2022.
  3. L. Yang, X. Chen, S. M. Perlaza, and J. Zhang, "Special Issue on Artificial-Intelligence-Powered Edge Computing for Internet of Things," IEEE Internet of Things Journal, vol.7, no.10, pp.9224-9226, Oct. 2020.
  4. M. A. Rahman, M. S. Hossain, A. J. Showail, N. A. Alrajeh, and A. Ghoneim, "AI-Enabled IIoT for Live Smart City Event Monitoring," IEEE Internet of Things Journal, vol.10, no.4, pp.2872-2880, Feb. 2023.
  5. Y. Chi, Y. Dong, Z. J. Wang, F. R. Yu, and V. C. M. Leung, "Knowledge-Based Fault Diagnosis in Industrial Internet of Things: A Survey," IEEE Internet of Things Journal, vol.9, no.15, pp.12886-12900, Aug. 2022.
  6. X. Deng, J. Li, C. Ma et al., "Low-Latency Federated Learning With DNN Partition in Distributed Industrial IoT Networks," IEEE Journal on Selected Areas in Communications, vol.41, no.3, pp.755-775, Mar. 2023.
  7. J. Huang, L. Kong, G. Chen, M.-Y. Wu, X. Liu, and P. Zeng, "Towards Secure Industrial IoT: Blockchain System With Credit-Based Consensus Mechanism," IEEE Transactions on Industrial Informatics, vol.15, no.6, pp.3680-3689, Jun. 2019.
  8. T. Hafeez, L. Xu, and G. Mcardle, "Edge Intelligence for Data Handling and Predictive Maintenance in IIOT," IEEE Access, vol.9, pp.49355-49371, Mar. 2021.
  9. P. Zhang, P. Gan, G. S. Aujla, and R. S. Batth, "Reinforcement Learning for Edge Device Selection Using Social Attribute Perception in Industry 4.0," IEEE Internet of Things Journal, vol.10, no.4, pp.2784-2792, Feb. 2023.
  10. Z. Tong, J. Cai, J. Mei, K. Li, and K. Li, "Dynamic Energy-Saving Offloading Strategy Guided by Lyapunov Optimization for IoT Devices," IEEE Internet of Things Journal, vol.9, no.20, pp.19903-19915, Oct. 2022.
  11. Y. Liu, Z. Ma, Y. Yang, X. Liu, J. Ma, and K. Ren, "RevFRF: Enabling Cross-Domain Random Forest Training With Revocable Federated Learning," IEEE Transactions on Dependable and Secure Computing, vol.19, no.6, pp.3671-3685, Nov.-Dec. 2022.
  12. K. Jung, I. Baek, S. Kim, and Y. D. Chung, "LAFD: Local-Differentially Private and Asynchronous Federated Learning With Direct Feedback Alignment," IEEE Access, vol.11, pp.86754-86769, 2023.
  13. X. Cao, G. Sun, H. Yu, and M. Guizani, "PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks," IEEE Internet of Things Journal, vol.10, no.5, pp.3749-3762, Mar. 2023.
  14. C. T. Dinh, N. H. Tran, M. N. H. Nguyen et al., "Federated Learning Over Wireless Networks: Convergence Analysis and Resource Allocation," IEEE/ACM Transactions on Networking, vol.29, no.1, pp.398-409, Feb. 2021.
  15. J. Lu, H. Liu, R. Jia, J. Wang, L. Sun, and S. Wan, "Toward Personalized Federated Learning Via Group Collaboration in IIoT," IEEE Transactions on Industrial Informatics, vol.19, no.8, pp.8923-8932, Aug. 2023.
  16. J. Mills, J. Hu, and G. Min, "Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT," IEEE Internet of Things Journal, vol.7, no.7, pp.5986-5994, Jul. 2020.
  17. Z. Zhao, C. Feng, W. Hong et al., "Federated Learning With Non-IID Data in Wireless Networks," IEEE Transactions on Wireless Communications, vol.21, no.3, pp.1927-1942, Mar. 2022.
  18. X. You, X. Liu, N. Jiang, J. Cai, and Z. Ying, "Reschedule Gradients: Temporal Non-IID Resilient Federated Learning," IEEE Internet of Things Journal, vol.10, no.1, pp.747-762, Jan. 2023.
  19. S. Park, and W. Choi, "Regulated Subspace Projection Based Local Model Update Compression for Communication-Efficient Federated Learning," IEEE Journal on Selected Areas in Communications, vol.41, no.4, pp.964-976, Apr. 2023.
  20. S. Ji, W. Jiang, A. Walid, and X. Li, "Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning," IEEE Intelligent Systems, vol.37, no.2, pp.27-34, Mar.-Apr. 2022.
  21. J. Shu, W. Zhang, Y. Zhou, Z. Cheng, and L. T. Yang, "FLAS: Computation and Communication Efficient Federated Learning via Adaptive Sampling," IEEE Transactions on Network Science and Engineering, vol.9, no.4, pp.2003-2014, Jul.-Aug. 2022.
  22. M. Alishahi, P. Fortier, W. Hao, X. Li, and M. Zeng, "Energy Minimization for Wireless-Powered Federated Learning Network With NOMA," IEEE Wireless Communications Letters, vol.12, no.5, pp.833-837, May 2023.
  23. J. Kim, D. Kim, J. Lee, and J. Hwang, "A Novel Joint Dataset and Computation Management Scheme for Energy-Efficient Federated Learning in Mobile Edge Computing," IEEE Wireless Communications Letters, vol.11, no.5, pp.898-902, May 2022.
  24. E. Malan, V. Peluso, A. Calimera, and E. Macii, "Communication-Efficient Federated Learning With Gradual Layer Freezing," IEEE Embedded Systems Letters, vol.15, no.1, pp.25-28, Mar. 2023.
  25. Y. Nesterov, Lectures on Convex Optimization (2nd. ed.), Springer Publishing Company, SOIA, vol.137, 2018.
  26. X. Ji, J. Tian, H. Zhang, D. Wu, and T. Li, "Joint Device Selection and Bandwidth Allocation for Cost-Efficient Federated Learning in Industrial Internet of Things," IEEE Internet of Things Journal, vol.10, no.10, pp.9148-9160, May 2023.
  27. L. Cantos, M. Awais, and Y. H. Kim, "Max-Min Rate Optimization for Uplink IRS-NOMA With Receive Beamforming," IEEE Wireless Communications Letters, vol.11, no.12, pp.2512-2516, Dec. 2022.
  28. B. Zhong, Z. Zhang, D. Zhang, K. Long, and H. Cao, "Partial Relay Selection in Decode and Forward Cooperative Cognitive Radio Networks over Rayleigh Fading Channels," KSII Transactions on Internet and Information Systems, vol.8, no.11, pp.3967-3983, 2014.
  29. L. Cui, J. Ma, Y. Zhou, and S. Yu, "Boosting Accuracy of Differentially Private Federated Learning in Industrial IoT With Sparse Responses," IEEE Transactions on Industrial Informatics, vol.19, no.1, pp.910-920, Jan. 2023.
  30. X. Wang, S. Lv, X. Wang, and Z. Zhang, "Greedy Heuristic Resource Allocation Algorithm for Device-to-Device Aided Cellular Systems with System Level Simulations," KSII Transactions on Internet and Information Systems, vol.12, no.4, pp.1415-1435, 2018.
  31. B. Qiu, X. Chang, X. Li, H. Xiao and Z. Zhang, "Federated Learning-Based Channel Estimation for RIS-Aided Communication Systems," IEEE Wireless Communications Letters, vol.13, no.8, Aug. 2024.