Differentially Private Asynchronous Federated Learning for Mobile Edge Computing …folk.uio.no/yanzhang/IEEETIILYL1.pdf · 2020-01-13 · Federated Learning [12], [13] can introduce

1551-3203 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2019.2942179, IEEETransactions on Industrial Informatics

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Differentially Private Asynchronous FederatedLearning for Mobile Edge Computing in Urban

InformaticsYunlong Lu, Student Member, IEEE, Xiaohong Huang, Member, IEEE, Yueyue Dai, Student Member, IEEE,

Sabita Maharjan, Member, IEEE, and Yan Zhang, Senior Member, IEEE

Abstract—Driven by technologies such as Mobile Edge Com-puting (MEC) and 5G, recent years have witnessed the rapiddevelopment of urban informatics, where a large amount of datais generated. To cope with the growing data, Artificial Intelligence(AI) algorithms have been widely exploited. Federated learningis a promising paradigm for distributed edge computing, whichenables edge nodes to train models locally without transmittingtheir data to a server. However, the security and privacy concernsof federated learning hinder its wide deployment in urbanapplications such as vehicular networks. In this paper, we proposea Differentially Private Asynchronous Federated Learning (DP-AFL) scheme for resource sharing in vehicular networks. To builda secure and robust federated learning scheme, we incorporateLocal Differential Privacy into federated learning for protectingthe privacy of updated local models. We further propose arandom distributed update scheme to get rid of the securitythreats led by a centralized curator. Moreover, we performthe convergence boosting in our proposed scheme by updatesverification and weighted aggregation. We evaluate our schemeon three real-world datasets. Numerical results show the highaccuracy and efficiency of our proposed scheme, while preservesthe data privacy.

Index Terms—Data sharing, Local differential privacy, Feder-ated learning, Urban informatics

I. INTRODUCTION

Urban informatics [1] is an emerging field to apply in-formation technology in urban areas to improve the livingexperience of citizens. The advancements in technologies suchas 5G, edge computing and Internet of Things (IoT) havepaved the way for the rapid development of urban informatics,resulting in an exponential growth of data generated by urbaninfrastructures. In urban vehicular networks, due to the limitedcomputing resource and bandwidth of wireless networks, it ischallenging for vehicles to use the massive data for improvingservices such as autonomous driving and traffic prediction.The emergence of Mobile Edge Computing (MEC) makes itpossible for edge nodes such as vehicles, wireless sensors and

This work was supported by Joint Funds of National Natural ScienceFoundation of China and Xinjiang under Project U1603261, and NationalNatural Science Foundation of China under Project 61602055.

Y. Lu, X. Huang are with the Institute of Network Technology, BeijingUniversity of Posts and Telecommunications, Beijing, China e-mail: ([email protected]; [email protected];).

Y. Dai is with the University of Electronic Science and Technology ofChina, Chengdu, China (email:[email protected]).

S. Maharjan is with Simula Metropolitan Center for Digital Engineeringand the University of Oslo, Norway, (e-mail: [email protected]).

Y. Zhang is with the University of Oslo, Norway, (e-mail:[email protected]).

IoT devices, which are equipped with computing and storagecapability, to store and process data locally. In [2], a UAV-based MEC system was studied for computation offloadingwith limited local processing capabilities. In [3], the authorsproposed to use deep learning approach to address the chal-lenges of varying edge servers’ states and vehicular offloadingmodes in MEC for achieving efficient and optimal task offload-ing schemes. In [4], the authors studied a multi-user multi-edge-node computation offloading problem and proposed amodel-free reinforcement learning offloading mechanism tomaximize the long-term utilities. In [5], the authors addressedthe resource allocation problem using convex and quasi-convexoptimization techniques, and further proposed a novel heuristicalgorithm to the task offloading problem, which can achievea suboptimal solution in polynomial time.

Artificial Intelligence (AI) is a powerful paradigm foraddressing a series of complexity problems which can notbe solved by conventional algorithms, such as solving NP-hard problems [6] and making accurate predictions. Appliedin urban informatics, AI has achieved tremendous success inefficient and effective resource allocation. In [7], the authorsadopted a deep Q-learning approach for designing optimaloffloading schemes in urban informatics with combined con-sideration of target server and transmission mode. In [8],jointly taking network traffic and computation workload intoconsideration, the authors exploited the trade off betweenenergy consumption and service delay with reinforcementlearning in vehicular networks. In [9], the authors furtherexplored deploying edge intelligence with blockchain forresource allocation in industrial IoT.

With MEC addressing the problem of limited resources,and AI enabling the edge nodes to process and analyze mas-sive data for classification and prediction, resource allocationproblems in urban vehicular networks such as data sharingand content caching have been well explored. The sharing ofresources enables vehicles to work together for analyzing thelarge amount of distributed data and improving the qualityof services. In [10], the authors presented an algorithm forallocating tasks to distributed energy resources by combininga new control strategy with an information and communica-tion technology architecture. In [11], the authors proposeda mathematical framework to model the performance of ahierarchical shared edge caching system for content deliveryin smart industrial and connected car applications. Despite thetremendous progress made by machine learning in resource




allocation, it is still a challenging task to perform distributedmachine learning on MEC empowered edge nodes, due tothe constrained resources and privacy concerns such as dataleakage and data eavesdropping.

Federated Learning [12], [13] can introduce considerablebenefits to edge computing for achieving edge intelligencein urban informatics. As a decentralized machine learningtechnique, federated learning addresses the privacy concernsby distributing the training work to distributed users. Theusers train their local model by edge computing based ontheir local data. Federated learning, which leverages edgecomputing to run AI algorithms, is an effective way to realizeedge intelligence. Instead of sending data to a centralized cu-rator for centralized training in conventional machine learningalgorithms, users in federated learning transmit none of theirraw data to others, which protects the data privacy. Severaladvanced works have already exploited federated learningfor edge computing. In [14], the authors proposed a controlalgorithm which determines the best trade off between localupdate and global parameter aggregation in federated learningfor edge computing systems. In [15], the authors proposeda federated learning based proactive content caching schemebased on a hierarchical architecture consisting of users and aserver. With the features of decentralized, constrained resourcetolerant, local storage and local training, federated learningbecomes a promising paradigm for MEC in urban informatics.Thus, we exploit federated learning for resource sharing inurban vehicular networks.

However, to deploy federated learning in urban vehicularnetworks, some new challenges need to be tackled. First, themobility of vehicles makes it difficult to maintain continuoussynchronized communication between the cloud server and theclient vehicles. Second, the centralized curator for aggregationis vulnerable to security threats, which can lead to the failureof the whole learning phase. Third, being inferred by an adver-sary, the updated models may leak the private information [16]of client vehicles. In this paper, we propose a DifferentiallyPrivate Asynchronous Federated Learning (DP-AFL) schemefor secure resource sharing in vehicular networks. Our maincontributions are as follows.

• We enhance the privacy of updated models in federatedlearning by incorporating local differential privacy intogradient descent training scheme.

• We develop a new asynchronous federated learning ar-chitecture by leveraging distributed peer to peer updatescheme instead of the centralized update to mitigate thesecurity threats incurred by a centralized curator.

• We boost the convergence process of our proposed asyn-chronous federated learning by verifying the updates andperforming the weighted aggregation in our proposedscheme.

The remaining of this paper is organized as follows. InSection II, we present the threat model and formulate our prob-lem. In Section III, our DP-AFL scheme is provided in detail.A deployment of tree-based DP-AFL is further discussed inSection IV. In Section V, we present the numerical results ofour proposed scheme on three real-world datasets. Finally, we

summarize this paper in Section VI.

II. SYSTEM MODEL AND PROBLEM FORMULATION

We consider the scenario of applying federated learningin urban informatics for tasks such as content caching anddata sharing. As depicted in Fig. 1, the vehicular networkconsists of the Macro Base Stations (MBSs), a number ofRoad Side Units (RSUs) and moving vehicles. Each vehicle isequipped with computing and caching resources. The MBSshave powerful computing and storage capabilities, whichenables them to perform a large number of computing andcaching tasks. Each RSU is connected to the MBSs and is ca-pable of computing and caching. The RSU communicate withthe MBSs through uplink communication and communicatewith vehicles within its range through downlink V2R com-munication. Vehicles involved in a common computing task(e.g., training a prediction model) work together to completethe task collaboratively, by using machine learning methods.Fig. 2 shows the architecture of our federated learning in avehicular network, which consists of local training, distributedupdate and boosting. In the local training, a vehicle executesnoise-added gradient-descent algorithms to obtain the localmodel. Then the vehicle updates its local model to othersby distributed update. For a vehicle, there are three types ofcommunication channels in the network: Vehicle-to-Vehicle(V2V), Vehicle-to-RSU (V2R), and Vehicle-to-MBS (V2M).The boosting accelerates the convergence process by assessingthe quality of training. In addition, vehicles in the proposedfederated learning keeps all data stored and processed locally,which makes much contribution to protect the privacy of dataproviders.

Federated learning is a new and promising scheme in mobileedge computing for resource allocation tasks. As two mainapplications, we consider the well explored content cachingand data sharing in our scheme. In content caching, users trainlocal models using their own data and the server aggregatesall the updated models from users to generate a recommen-dation list of files for caching. As for data sharing, all dataproviders involved collaboratively train a global data modelby leveraging federated learning. The data model containsvalid information of the requested data. Due to the benefitsof federated learning, we exploit deploying federated learningin urban informatics for data-related training tasks, especiallyin urban vehicular networks.

A. Threat Model

Compared with other centralized machine learning methods,federated learning relieves the privacy problem in the learningphase by training locally. However, it also brings some newsecurity concerns. We analyze the possible privacy issues fromtwo main aspects: vulnerability of the curator and vulnerabilityof the clients. First, we consider threats towards the curator.Since the curator collects all updated models and aggregatesthe updates, it is a crucial centralized component in feder-ated learning scheme. The centralized aggregating scheme isvulnerable to the malfunction of the curator. If the globalmodel to be distributed to client users is distorted, or there


Base station Roadside unitCaching resource Cloud server

w1(t) w2(t) wK-1(t) wK(t)…

!𝑛𝑘𝑛 wk-1

(

(𝑡)

w(t)

Fig. 1: Federated learning in vehicular networks

is a single point of failure, the whole learning phase maycollapse. Moreover, the attackers may learn private informationfrom these model parameters. Since the clients train localmodels based on the global model received from the curator,the curator has a chance to learn the information of the clients’data by distributing a customized malicious model to clientsand collecting their parameters. The clients will not be awareof the malicious model from the curator. For example, theserver would assume a particular distribution that client datamay be drawn from. Then, instead of letting clients minimizea loss function, the server now lets clients maximize thelikelihood function of the parameters of the chosen distributionon their own data. In this way, the server can get the datadistribution information of the client.

Second, we consider threats towards client users, whichcan be incurred by malicious participants. An adversary canlearn the private information of client users by analyzingtheir updates (e.g., through differential attack). Moreover, theByzantine attack also exists in the learning scheme. In aByzantine attack, the malicious client users may provide bador low-quality updates to the curator instead of valid updates,while they can still get the global model from the curator. As aconsequence, the accuracy of the global model will be reduced,which may lead to the failure of the whole learning phase andeliminate the participating enthusiasm of other honest clientusers. Thus, a new robust federated learning scheme is requiredto further address the privacy and security issues in traditionalfederated learning.

B. Problem Formulation

We consider applying federated learning in vehicular net-works for resource allocation tasks, while addressing theprivacy and security issues. The set of RSUs is denoted asU = {u1, u2, ..., um} , and the set of vehicles is denotedas V = {v1, v2, ..., vK}. Each vehicle vi possesses a localdataset Di = {(x1, y1), ..., (xn, yn)}, where xi is the inputfor machine learning models and yi is the desired output (i.e,

the labels of xi). Di is a subset of the whole training setD = {∪Di}. The vehicle connects to the RSU when it locatesin the coverage area. The goal is to learn a global predictivemodelM = h(w, x) over the training set D. For each vehiclevi, its loss function for dataset Di is

Fi(w) =1

|Di|∑j∈Di

fj(h(w, x), y), (1)

where f(h(w, x), y) is the loss function for j-th data sample(xj , yj) with model parameters w.

We define the objective function F (w) as:

F (w) =1

|D|∑j∈D

fj(h(w, x), y) =1

|D|

K∑j=1

|Di| · Fi(w). (2)

The goal of our federated learning is to train a global modelM = h(w, x), in the scenario of vehicular networks. It is anoptimization problem to minimize F (w) and provide strongprivacy guarantee. That is,

h(w) = arg minw∈{w(t):t<T}

F (w)

s.t. Pr(wi ∈ Rd) ≤ exp(ε)Pr(w′

i ∈ Rd)∀vi ∈ V, i ∈ {1, 2, . . . ,K}

(3)

where w(t) is the parameter set of the aggregated model atround t, and T is the maximum updating rounds. Pr(wi ∈Rd) ≤ Pr(w

′

i ∈ Rd) is the ε− privacy guarantee for updateparameters wi, and w(t) is derived from Eq. 4:

w(t+ 1) = w(t) +1

K

K∑i=0

∆wi, (4)

where ∆wi is the update from client vi in round t.

III. PRIVACY PRESERVED ASYNCHRONOUS FEDERATEDLEARNING

Since problem (3) is NP-hard to find a solution, we considerincorporating a new federated learning scheme into vehicularnetworks to address problem (3). The goal of our federatedlearning is to find a set of parameters for the model whichminimizes the loss function. Due to the mobility of vehicles,an asynchronous federated learning scheme without a curatoris developed for vehicular networks.

Our proposed DP-AFL consists of three procedures: localdifferentially private gradient descent training, distributed ran-dom update and convergence boosting.

A. Local Differential Privacy: gradient descent training

We leverage gradient descent, an effective iterative op-timization method, to minimize the loss function in (3).Gradient descent minimizes the objective function F (w) byupdating the parameters in the opposite gradient direction ofthe objective function −∇F (w). For a vehicle vi, the goal oflocal training is to find the model parameters wi by movingtowards −∇Fi(w) defined in Eq. (5)

∇Fi(w) =∂F (yi, f(xi))

∂f(xi). (5)


Vehicular Networks

Weighted Aggregation Updates Verification

… Aggregation … Score (vi)

Boosting

Local Data

Gradient Descent

LDP Noise

Local Model

Local Training

Distributed Update1

2

3

4

Local Model m(t)

Global Model M(t)

Fig. 2: The proposed differentially private asynchronous federated learning

For vehicle vi in iteration t, a local update model (parametervector) wi(t) is computed according to Eq. (6)

wi(t) = wi(t− 1) + αt · ∇Fi(wi(t− 1)). (6)

where αt is the step size for moving in the direction of theopposite gradient.

Local differential privacy (LDP) [17] is a recently proposedmeasure to provide a strong privacy guarantee. Unlike the con-ventional differential privacy [18] which provides guarantee indata analysis phase, LDP focuses on the privacy in data col-lection process towards data collectors. To protect the privacyof updated parameters and achieve local differential privacy,we apply Gaussian mechanism [19] on the update models ofeach vehicle by adding noise to perturb the parameters, asdescribed in Eq. (7)

wi(t) = wi(t−1)+αt ·(∇Fi(wi(t−1))+N (0, σ2 ·S2f )), (7)

where N (0, σ2S2f ) is the added Gaussian noise with mean 0

and standard deviation σ · Sf . Incorporating local differentialprivacy into local learning process, vehicle vi trains the noisedupdate wi(t) locally in iteration t.

To ensure the ε-privacy, the total privacy cost should belimited in ε. For a vehicle vi with a total of T iterations,the privacy budget (cost) should be split across each iterationwhere ε =

∑K1 εt. For example, given a fixed T , the privacy

budget can be evenly divided into each iteration: εt = ε/T, 1 ≤t ≤ T . In iteration t, if the accumulated privacy cost

∑t1 > ε,

the learning should be stopped.Taking optimization iterations for accurate results and pri-

vacy cost for privacy preservation into combined consider-ation, we allocate the privacy budget adaptively into eachiteration. In our gradient optimization process, the initialprivacy cost is ε0. If the gradients contribute much in the globalmodel, more noise (i.e., less privacy cost ε) should be addedto protect the privacy since the gradients are approaching theconvergence results. Thus the privacy cost for iteration t isεt = ε0 · (1−α). Otherwise, if the gradients contribute little, a

larger privacy cost is assigned, εt = ε0 · (1 +β), which meansless noise is added to the gradients.

B. Distributed Update: random sub-gossip update

As one of the challenges to deploy federated learning invehicular networks, the mobility of vehicles makes it diffi-cult to maintain a synchronized update mechanism betweencentralized curator and clients. To address the mobility issueand get rid of the centralized curator, we propose a distributedrandom-gossip update scheme.

In each iteration, a participating vehicle vi runs a randomselecting algorithm based on the communication and com-puting resources to sample a subset vehicles Vs ⊂ V forupdating. Unlike classic federated leaning where all clientssend updates to the centralized curator, vehicle vi only gossipits update wi(t) to these sampled vehicles, which improves thereliability of our federated learning. Since the time complexityfor updating is O(log(|V |)), the size of Vs usually takes asmall value to reduce the redundant transmission of param-eters and reduce the cost of computing and communicationresources. To balance the transmission delay and computingcost, the size can be adjusted dynamically according to thecommunication condition and the computing capability ofthe receiving vehicles. Since the gossip scheme can achieveconvergent consistency, the update wi(t) will be disseminatedto all related vehicles in O(log(n)) time. In addition, MeanAbsolute Error (MAE) of model wi(t) calculated by Eq. (8)is also broadcasted with wi(t),

MAE =1

|di|

|di|∑1

|yi − wj(xi)|, (8)

where a low MAE denotes a good quality of the model.The random sub-gossip scheme prevents the malicious

participants from receiving updates of vi continuously, whichmitigates the risk of inferring data information of vi byanalyzing the updated models. For vehicle vj , it collects


all received updated models and calculates its global modelmi(t) = mi(t− 1) +

∑wi(t).

Algorithm 1 illustrates the process of our proposed randomupdate scheme.

Algorithm 1 Random sub-gossip update scheme

Input: local model set {mi(t− 1)}, participated vehicles VOutput: new local model set {mi(t)}

1: for each participant vi ∈ P do2: if t ≤ T then3: vi collects all updated noised models wj(t) sent from

other participating vehicles4: vi trains its new global model mi(t) = mi(t− 1) +∑

wj(t)5: vi calculates the MAE of mi(t) to quantify the

quality of the trained model6: vi adds noise to the new global model mi(t) =

mi(t) +Noise to achieve local differential privacy7: vi runs the random algorithm to sample a subset of

vehicles from the whole vehicles involved, Vs ⊂ V8: vi broadcasts mi(t) and MAE(mi(t))to other vehi-

cles vk ∈ Vs9: else

10: End the training11: end if12: end for13: return {mi(t)}

C. Convergence Boosting: updates verification and incentiveweighted aggregation

It is unfair if all participants are treated equally and obtainthe same global model at the end of federated learning, regard-less of their contributions in the learning phase. We proposea reward algorithm in our DP-AFL to incent participants toprovide good data and compute accurate models. The goal ofour convergence boosting is to let the contribution of goodmodels increase and bad models reduce.

In iteration t, a participating vehicle vi holds its local datasetdi = {(xi, yi)}, its global model mi(t−1), and received a setof updated models

∑j1 wj with size of K, together with the

MAE of model wj . vi selects the models with low MAE toexecute the aggregation and drop the models with bad qualityas defined in Eq. (9),

mi(t) = mi(t− 1) +1

N

N∑k=1

γ · wk(t), (9)

where N is the number of selected good models, and γ isthe weight for each updated model in aggregation. To besimplified, γ can be defined as Eq. (10)

γ = 1− mae(k)∑Nj=1mae(j)

. (10)

In the proposed scheme, we enable the model trainersto provide the MAE of their updated model to ensure theefficiency of the aggregation process. However, it is risky since

the trainers may cheat others by providing a fake MAE. Toaddress this issue, we leverage directed acyclic trust graph(DATG) in the asynchronous mechanism to ensure the honestof each trainer to provide the correct MAE towards its updatemodel, as shown in Fig. 3. Each node in DATG represents anupdated model, with the attributes of model wi(t), MAE andcumulative weight. In each iteration, a participant randomlyverifies the MAE of updated models it received with a pro-portion ρ based on its own data, and broadcasts the verifyingresults to other participants. The cumulative weight of a nodei (model mi(t)) is calculated by

weight(i) = MAE(mi(t)) +

m∑j=1

MAEj(mi(t)), (11)

where∑mj=1MAEj(mi) is the verified MAE received by

vehicle vi from others in iteration t. An edge in DATG denotesthat a node is verified by the connected node in its trainingphase. Thus, all the verification results towards the updatedmodel are recorded in DATG. We calculate the cumulativenode weight S(vi) for a participants vi as a trust score for vi,

S(vi) =

T∑t=1

weight(mi(t)), (12)

where T is the current iteration. Thus, we incorporate the trustscore for participants and re-adjust Eq. 10 into

mi(t) = mi(t− 1) +1

N

N∑k=1

γ · S(vi) · wk(t), (13)

where mi(t) is trained from participated vehicle vi.

mi2

WW

mi mit

W

mi1

W W

WW

Iteration 1 Iteration 2 …… Iteration t

Score(vi)

Fig. 3: The directed acyclic trust graph for updatesverification

Algorithm 2 illustrates the overall process of our proposedasynchronous federated learning scheme.

IV. TREE-BASED DP-AFL FOR DATA SHARING

We perform our DP-AFL scheme for data sharing in vehicu-lar networks, by adopting a tree-based gradient descent model- Gradient Boosting Decision Tree (GBDT) for local training.The fundamental principle of learning in GBDT is similar tothe learning process in federated learning. The training processof GBDT can be performed with little training data and canachieve high accuracy in short time. The goal of the trainingprocess is to iteratively minimize the loss function towards adata sharing request. Vehicle vi first submits a data sharingrequest req towards a specific category of data dreq for tasks


Algorithm 2 Differential private asynchronous federatedlearning

Input: datasets D = {di}, participated vehicles V = {vi}Output: global model M

1: for each vehicle vi ∈ {V } do2: while t ≤ T do3: if t = 1 then4: Train a new model ˆmi(t) based on its local dataset

di ⊂ D5: Compute the noise-added mi(t) for local differen-

tial privacy and compute its MAE6: Broadcast mi(t) and its MAE to a subset of

randomly sampled vehicles Vs ⊂ V7: else8: Collect all updated models wi(t) from other vehi-

cles9: Compute global model mi(t) = mi(t − 1) +

1N

∑Nk=1 γ · S(vi) · wk(t), and its MAE(mi(t))

10: Perturb the model mi(t) into mi(t) by applyingGaussian mechanism to achieve local differentialprivacy

11: Sample a subset of vehicles where vk ∈ Vs andVs ⊂ V

12: Broadcast mi(t) and MAE(mi(t)) to Vs13: end if14: end while15: end for16: return M

such as traffic prediction and driving path planning. Insteadof returning the raw data to the requester directly, we providethe data models learned by our federated learning scheme toprotect the privacy of data providers. The objective is to traina global model M on the distributed dataset dreq, which canprovide the requesters with the predicting answers towards itsdata sharing requests. We leverage our DP-AFL scheme toaddress the optimization problem denoted in Eq. (14):

f = arg minf,req,dreq

∑vi∈V

∑xj ,yj∈dvi

L(yj ,m(xj) + w(xj)), (14)

where m(xj) is the previous prediction and w(x) is the newlybuilt model in one iteration. Here, w(x) is a regression treewhich splits the feature space into several regions. For eachfeature region r, a value θr is assigned to r. Thus, w(x) canbe defined as:

w(x) =∑r

θr · Ir(x), (15)

Ir(x) =

{1 x ∈ r0 x /∈ r , (16)

Eq. (14) can be specified to find the best θ = {θr1, θr2, ...}that satisfies Eq. (17) under the privacy budget ε

θ = arg minθ,ε

∑vi∈V

∑xj ,yj∈dvi

L(yi,m(xj) +∑r

θr · Ir(xj)),

(17)The residual, calculated approximately as the opposite gra-

dient of the loss function, is used to quantify the performance

of splitting θ. To solve problem (17), we execute the iterativetraining with distributed update and weighted aggregation ineach iteration. The split, denoted by θ, is moving towards thesteepest-descent direction (i.e., opposite gradient), to approachthe optimal solution.

We apply Gaussian Mechanism to the splitting nodes andthe leaf nodes of the regression tree, which is trained by eachparticipant in one iteration. For each participating vehicle vi,the noise is added before the update phase to achieve localdifferential privacy. For a total privacy budget ε, we firstestimate the number of iterations as T . Then we calculatethe benchmark privacy budget for each iteration as εi = ε/T .In iteration t ∈ T , εt can be adjusted according to the trainingquality and existing cumulated privacy budget, where a highertraining quality and larger cumulated budget lead to a smalleriteration budget εt. The cumulated privacy budget must satisfyε1+ ...+εt < ε to guarantee ε-privacy. Otherwise, the iterativetraining should be stopped to prevent privacy leakage. Thedifferentially private regression tree is shown in Fig. 4. Thesplitting nodes depict the model parameters θ and the noiseadded to the nodes ensures the differential privacy of modelparameters.

F3

leaf

leaf

F4

F2

F1 F2 F4 F5

leaf leaf

Noise

Noise

Noise

Noise

Noise Noise

Fig. 4: A differentially private regression tree

Through the tree-based DP-AFL process, the requestingvehicle vreq gets the global data modelM for its requests. Theglobal data model can be used for a series of applications suchas making predictions and analyzing unknown data. In thisway, the value of data is shared and utilized among vehicles inurban vehicular networks. Algorithm 3 describes the completeprocess of our proposed tree-based DP-AFL.

V. SECURITY ANALYSIS AND NUMERICAL RESULTS

A. Security Analysis

1) Algorithm 3 satisfies local ε-differential privacy: Accordingto the definition of local differential privacy, the processsatisfies local differential privacy if it satisfies local differ-ential privacy in the data collection phase. For the privacycost in various iterations, we divide privacy budget ε toeach iteration and satisfies ε1+ε2+ ...+εT ≤ ε. Accordingto the sequential composability of differential privacy, itsatisfies ε-differential privacy.

For the splitting phase in each regression tree, the privacybudget is only cost in Step 5, in which Gaussian noise is




Algorithm 3 Tree-based federated learning for vehicular datasharing

Input: data request Req, participating vehicles V , datasets DOutput: data model M

1: while accuracy ≤ Threshold and t ≤ T do2: for each vehicle vi ∈ P do3: Collects all updated models wi(t) from other vehicles4: Run local GBDT training towards request Req on its

local data di ⊂ D, obtain the aggregated regressiontree Tri(t) = Tri(t − 1) +

∑j

˜Trj(t) and compute

its MAE(Tri(t))5: Add noise to the local regression tree Trei(t) to get

local differentially private regression tree ˜Tri(t)6: Run the random sub-gossip process to broadcast

˜Tri(t) and MAE(Tri(t)) to other vehicles7: Randomly sample a fraction of updated models to

perform verification with its data di8: Update results to local DATG, and broadcast the

results to other vehicles9: end for

10: end while11: return M to the requester

added to the splitting nodes and leaf nodes of the regressiontree. Since these nodes are disjointed feature subset of thewhole feature set, according to the parallel composabilityof differential privacy, it satisfies ε-differential privacy.

Thus, Algorithm 3 satisfies ε-local differential privacy.2) Our random update scheme enhances data privacy and

system security: We leverage the proposed asynchronousrandom update scheme to replace conventional server-clientmode. It mitigates the security risks led by the failure ofthe centralized server, and protects the privacy of updatedmodels from inference attack.

3) Our proposed DATG incentive scheme ensures the learningquality: DATG is an asynchronous proof that records theperformance of each participant in the federated learningphase. By measuring the cumulative weight for a partici-pant, the providers who provide good data models will geta high score and contribute much to the global model. Onthe contrary, the negative providers tend to be ignored inthe learning process.

B. Numerical Results

We apply our proposed federated learning scheme to agradient descent regression model - GBDT, to execute collab-orative learning tasks for data sharing and resource caching inurban vehicular networks.

We conduct evaluations of our proposed scheme on threereal-world datasets, the Reuters dataset [20], the 20 news-groups dataset [21], and the Ohsumed dataset [22], whichare three most popular datasets for learning tasks towardsunstructured data. The data in the three datasets is in the formof unstructured short text, which is similar to the unprocessedfragmented data generated in the urban environment.

1) The Reuters dataset: this dataset contains 90 classes,10,788 files of news stories appeared on the Reutersnewswire. The mean number of terms per file is between93 and 1263, and the total term size is 35247.

2) The 20 newsgroups dataset: The 20 newsgroups is a collec-tion of 20,000 newsgroup documents, which are partitionedinto 20 groups. Some of the groups are close to others(e.g., rec.sport.baseball / rec.sport.hockey), while some aretotally different.

3) The Ohsumed dataset: The Ohsumed dataset is a subsetof MEDLINE database, which is a medical literaturemaintained by National Library of Medicine. The datasetconsists of 50,216 medical abstracts files in 23 diseasecategories.

593 620 704 712 750 790 812 821 900 938 1110 2110 2781 2815 2903

Size of Data set

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

AU

C

3 Data providers

6 Data providers

9 Data providers

Fig. 5: The accuracy in various groups of Reuters dataset

579 596 704 712 747 751 779 798 812 821 899 938 1061 1120 1336 1809 1888 2102 2132 2781 2800 2831 2903 2904

Size of Data set

0

500

1000

1500

2000

2500

3000

3500

4000

Ru

nn

ing

Tim

e (

ms)

3 Data providers6 Data providers9 Data providers

Fig. 6: Running time in various groups of Reuters dataset

As mentioned before, we use these data to simulate theattribute-based unstructured data generated in urban vehicularnetworks such as the configuration files of vehicular appli-cations and the status log files. To simulate the distributedapplication scenario with multiple parties, we divide eachdataset into various shards according to the category. Eachshard represents the dataset from a participant in our dis-tributed scenario. Note that compared with the real scenario,our partition is more stochastic. Thus, it is possible that aparticipant holds several totally different categories of data,which increases the difficulty of diagnosing. Also, the size ofeach shard is quite discrete due to that the number of files ineach category is fixed, which indicates that the value of Y-axis in our results can also be discrete. We adopt the ReceiverOperating Characteristic (ROC) curve, which is widely usedin machine learning tasks, to illustrate the accuracy of ourlearned model. To be more specific, the model accuracy isquantified by the Area Under the ROC Curve (AUC).




0 2000 4000 6000 8000 10000 12000 14000

Various data sets

0.5

0.6

0.7

0.8

0.9A

ccura

cy

Accuracy

Mean accuracy

Fig. 7: The accuracy in various groups of Ohsumed dataset

2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Size of data set

200

300

400

500

600

700

800

Runnin

g T

ime (

ms)

Running time

Mean running time

Fig. 8: Running time in various groups of Ohsumed dataset

We first evaluate our proposed federated learning schemeon Reuters dataset with various numbers (i.e., 3,6,9) of dataproviders. The accuracy results are shown in Fig. 5, whichshows the high accuracy of our proposed scheme. The runningtime is presented in Fig. 6. Then, Fig. 7 and Fig. 8 show thestatistical results - accuracy and running time - of our proposedscheme on the Ohsumed dataset. The accuracy in Fig. 7 rangesfrom 0.575 to 0.9, which are also high on Ohsumed comparedwith existing methods [23]. From Fig. 5 and Fig. 7, we canconclude that the accuracy changes little as the number of dataproviders increases, indicating the scalability of our proposedscheme. In Fig. 6 and Fig. 8, the running time changes littleas the data size increases. However, it increases obviouslyas the number of data providers increases. That is becauseour proposed federated learning can execute parallel localtraining, where the running time increases little as the data sizeincreases. However, with the increased size of data providers,more updated models need to be transmitted in our federatedlearning, which can lead to the increase in time for additionalcomputation and transmission.

We further evaluate the convergence of our proposed methodon three subsets of Ohsumed dataset, in the scenario of threeparticipating nodes. Fig. 9 clearly shows that the training lossconverges to a small value within a few iterations, whichindicates that our proposed method achieves good convergencerate.

We also evaluate the performance of our proposed schemeon 20 newsgroups dataset. Fig. 10 shows the statistical ac-curacy results in various groups. We can observe that theaccuracy of our scheme is statistically good with an averagevalue of 0.918. As we mentioned earlier, the unsmooth changeof the curve is due to the discrete data sizes of various groups.

0 5 10 15 20

Iteration

0

50

100

150

200

250

Loss

Ohsumed subset with 4138 files



Fig. 9: Loss on various Ohsumed dataset

0 50 100 150 200 250 300 350

Various data sets

0.78

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

AU

CFig. 10: Accuracy in various groups of 20 newsgroups

We further test the proposed scheme on 20 newsgroups withdifferent numbers of data providers. Fig. 11 and Fig. 12 showthe intuitive statistical results in terms of accuracy and runningtime. As illustrated earlier, the overall accuracy performanceon various groups with different numbers of data providers issimilar. For the fixed number of data providers, the runningtime changes little as the size of dataset increases. While forthe different number of providers, the running time increasesas the number of providers increases. Similar to our previousanalysis, we can conclude that the running time increases asthe number of data providers increases, and changes littleas the size of dataset changes. Furthermore, the comparisonresults of our proposed method with three benchmark schemes:Simple Graph Convolution Networks (SGCN), GraphStar andText Graph Convolution Networks (GCN) on various datasetsare depicted in Fig. 13. Our proposed method achieves higheraverage accuracy than other schemes on the two test datasets.

By evaluating our proposed federated learning on the threereal-world datasets, we get the following observations: 1) Ourproposed federated learning can execute parallel local training.2) The increase in data providers hardly affects the accuracyof the model. 3) More training time is needed to learn theglobal model as data providers increase.

VI. CONCLUSION

In this article, we proposed an asynchronous federatedlearning scheme for edge computing in vehicular networks.To protect the updated models of each participant, we incor-porated local differential privacy into gradient descent localtraining process. Due to the security and privacy concernsbrought by the centralized curator, we proposed a random peer-to-peer update mechanism to replace the conventional update




800 1000 1200 1400 1600 1800 2000 2200 2400 2600

Various data sets

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

AU

C

3 Data providers

5 Data providers

10 Data providers+250 +250 +1500 +1500

Fig. 11: The accuracy in various groups of 20 newsgroups

800 1000 1200 1400 1600 1800 2000 2200 2400 2600

Size of data sets

0

500

1000

1500

2000

2500

3000

Ru

nn

ing

tim

e (

ms)

3 Data providers

5 Data providers

10 Data providers

Fig. 12: Running time in various groups of 20 newsgroups

scheme between a centralized server and clients. Moreover,we verified the quality of updates and aggregated the updatedmodels based on their weights to boost the convergence ofour proposed scheme. We evaluated our proposed schemewith three popular real-world datasets and achieved goodperformance in terms of accuracy and running time, as wellas protecting the privacy of training data.

REFERENCES

[1] A. Mondal, P. Rao, and S. K. Madria, “Mobile computing, internetof things, and big data for urban informatics,” in 2016 17th IEEEInternational Conference on Mobile Data Management (MDM), vol. 2,June 2016, pp. 8–11.

[2] S. Jeong, O. Simeone, and J. Kang, “Mobile edge computing via a uav-mounted cloudlet: Optimization of bit allocation and path planning,”IEEE Transactions on Vehicular Technology, vol. 67, no. 3, pp. 2049–2063, March 2018.

[3] K. Zhang, Y. Zhu, S. Leng, Y. He, S. Maharjan, and Y. Zhang, “Deeplearning empowered task offloading for mobile edge computing in urbaninformatics,” IEEE Internet of Things Journal, 2019.

[4] T. Q. Dinh, Q. D. La, T. Q. S. Quek, and H. Shin, “Learning forcomputation offloading in mobile edge computing,” IEEE Transactionson Communications, vol. 66, no. 12, pp. 6353–6367, Dec 2018.

[5] T. X. Tran and D. Pompili, “Joint task offloading and resource allocationfor multi-server mobile-edge computing networks,” IEEE Transactionson Vehicular Technology, vol. 68, no. 1, pp. 856–868, Jan 2019.

[6] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neuralcombinatorial optimization with reinforcement learning,” 2016.

[7] K. Zhang, S. Leng, X. Peng, P. Li, S. Maharjan, and Y. Zhang,“Artificial intelligence inspired transmission scheduling in cognitivevehicular communications and networks,” IEEE Internet of Things, tobe published.

[8] Y. Wang, K. Wang, H. Huang, T. Miyazaki, and S. Guo, “Traffic andcomputation co-offloading with reinforcement learning in fog computingfor industrial applications,” IEEE Transactions on Industrial Informatics,vol. 15, no. 2, pp. 976–986, Feb 2019.

Fig. 13: Average accuracy comparison on various datasets

[9] K. Zhang, Y. Zhu, S. Maharjan, and Y. Zhang, “Edge intelligence andblockchain empowered 5g beyond for industrial internet of things,” IEEENetwork Magazine, to be published.

[10] C. Giovanelli, O. Kilkki, S. Sierla, I. Seilonen, and V. Vyatkin, “Taskallocation algorithm for energy resources providing frequency contain-ment reserves,” IEEE Transactions on Industrial Informatics, vol. 15,no. 2, pp. 677–688, Feb 2019.

[11] R. W. L. Coutinho and A. Boukerche, “Modeling and analysis of ashared edge caching system for connected cars and industrial iot-basedapplications,” IEEE Transactions on Industrial Informatics, pp. 1–1,2019.

[12] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, andD. Bacon, “Federated learning: Strategies for improving communicationefficiency,” arXiv preprint arXiv:1610.05492, 2016.

[13] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al.,“Communication-efficient learning of deep networks from decentralizeddata,” arXiv preprint arXiv:1602.05629, 2016.

[14] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, andK. Chan, “Adaptive federated learning in resource constrained edge com-puting systems,” IEEE Journal on Selected Areas in Communications,2019.

[15] Z. Yu, J. Hu, G. Min, H. Lu, Z. Zhao, H. Wang, and N. Georgalas,“Federated learning based proactive content caching in edge computing,”in 2018 IEEE Global Communications Conference (GLOBECOM).IEEE, 2018, pp. 1–6.

[16] X. Huang, Y. Lu, D. Li, and M. Ma, “A novel mechanism for fastdetection of transformed data leakage,” IEEE Access, vol. 6, pp. 35 926–35 936, 2018.

[17] Z. Qin, T. Yu, Y. Yang, I. Khalil, X. Xiao, and K. Ren, “Generatingsynthetic decentralized social graphs with local differential privacy,” inProceedings of the 2017 ACM SIGSAC Conference on Computer andCommunications Security. ACM, 2017, pp. 425–438.

[18] C. Dwork, “Differential privacy in new settings,” in Proceedings ofthe twenty-first annual ACM-SIAM symposium on Discrete Algorithms.SIAM, 2010, pp. 174–183.

[19] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security. ACM, 2016, pp. 308–318.

[20] D. D. Lewis, “Reuters dataset,” http://www.daviddlewis.com/resources/testcollections/, 2018.

[21] T. Mitchell, “20newsgroups dataset,” http://www.qwone.com/∼jason/20Newsgroups, 2019.

[22] Moschitti, “Ohsumed dataset,” http://disi.unitn.it/moschitti/corpora/ohsumed-all-docs.tar.gz, 2019.

[23] Rstoj, “Text classification on ohsumed,” https://paperswithcode.com/sota/text-classification-on-ohsumed, 2019.




Yunlong Lu received the B.S. degree in electronicinformation science and technology from BeijingForestry University, Beijing, China, in 2012 and theM.S degree in School of Computer Science fromBeijing University of Posts and Telecommunications(BUPT), Beijing, China, in 2015. He is currentlyworking towards the Ph.D. degree in Computer Sci-ence and Technology with the Institute of NetworkTechnology, BUPT, and a visiting Ph.D. student withthe University of Oslo, Norway. His current researchinterests include blockchain, wireless networks, and

privacy-preserving machine learning.

Xiaohong Huang received her B.E. degree fromBeijing University of Posts and Telecommunications(BUPT), Beijing, China, in 2000 and Ph.D. degreefrom the school of Electrical and Electronic Engi-neering (EEE), Nanyang Technological University,Singapore in 2005. Since 2005, Dr. Huang hasjoined BUPT and now she is an associate professorand director of Network and Information Centerin Institute of Network Technology of BUPT. Dr.Huang has published more than 50 academic papersin the area of WDM optical networks, IP networks

and other related fields. Her current interests are Internet architecture, softwaredefined networking, and network function virtualization.

Yueyue Dai received the B.Sc. degree in com-munication and information engineering from theUniversity of Science and Technology of China(UESTC), in 2014, where she is currently pursuingthe Ph.D. degree. She is now a visiting Ph.D. studentwith the University of Oslo, Norway. Her currentresearch interests include wireless network, mobileedge computing, Internet of Vehicles, blockchain,and deep reinforcement learning.

Sabita Maharjan (M’09) received the Ph.D. degreein networks and distributed systems from the Sim-ula Research Laboratory, and University of Oslo,Norway, in 2013. She is currently a Senior Re-search Scientist at the Simula Metropolitan Centerfor Digital Engineering, Norway, and an AssociateProfessor at the University of Oslo. Her current re-search interests include wireless networks, networksecurity and resilience, smart grid communications,Internet of Things, machine-to-machine communica-tion, software-defined wireless networking, and the

Internet of Vehicles.

Yan Zhang is a Full Professor at the Departmentof Informatics, University of Oslo, Norway. Hiscurrent research interests include: next-generationwireless networks leading to 5G Beyond, green andsecure cyber-physical systems (e.g., smart grid andtransport). He received a Ph.D. degree in Schoolof Electrical & Electronics Engineering, NanyangTechnological University, Singapore. He is an Editorof several IEEE publications, including IEEE Com-munications Magazine, IEEE Network, IEEE Trans-actions on Vehicular Technology, IEEE Transactions

on Industrial Informatics, IEEE Transactions on Green Communications andNetworking, IEEE Communications Surveys & Tutorials, IEEE Internet ofThings, IEEE Systems Journal and IEEE Vehicular Technology Magazine.He serves as chair positions in a number of conferences, including IEEEGLOBECOM 2017, IEEE PIMRC 2016, and IEEE SmartGridComm 2015.He is IEEE VTS (Vehicular Technology Society) Distinguished Lecturer. He isa Fellow of IET. He serves as the Chair of IEEE ComSoc TCGCC (TechnicalCommittee on Green Communications & Computing). He received the award2018 “Highly Cited Researcher” according to Clarivate Analytics.

Documents

Differentially Private Asynchronous Federated Learning for Mobile Edge Computing …folk.uio.no/yanzhang/IEEETIILYL1.pdf · 2020-01-13 · Federated Learning [12], [13] can introduce