Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
A Deep Reinforcement Learning
Approach for Data Migration in
Multi-access Edge Computing
26-28 November
Santa Fe, Argentina
Dario Bruneo
University of Messina, Italy
26-28 November
Santa Fe, Argentina
Smart services in Smart environments
• IoT diffusion
• Cloud-based approach
– What about latency?
http://callmeramzo.in
26-28 November
Santa Fe, Argentina
Multi-access Edge Computing (MEC)
• ETSI standard
• placing nodes with computation capabilities, MEC servers, close to the elements of the network edge
• MEC Vs. Fog Computing
– explicit interaction with network elements
– (network) information gathering
26-28 November
Santa Fe, Argentina
A 5G MEC-enabled LTE scenario
26-28 November
Santa Fe, Argentina
Challenges
• Resource allocation
• Application migration
(App Containerization)
• Proactive Vs. Reactive
approaches
AI-based techniques Machine Learning
26-28 November
Santa Fe, Argentina
Reinforcement Learning (RL)
• Learning through a trial and error process
• Best choice to solve decision making problems
• Markov Decision Process (MDP) formalism
• Q-Learning (model free approach)
648 Chapter 17. Making Complex Decisions
1 2 3
1
2
3 + 1
–1
4
–1
+1
R(s) < –1.6284
(a) (b)
– 0.0221 < R(s) < 0
–1
+1
–1
+1
–1
+1
R(s) > 0
– 0.4278 < R(s) < – 0.0850
Figure 17.2 (a) An optimal policy for the stochastic environment with R(s) = − 0.04 in
the nonterminal states. (b) Optimal policies for four different ranges of R(s).
and (3,3) are as shown, every policy is optimal, and the agent obtains infinite total reward be-
cause it never enters a terminal state. Surprisingly, it turns out that there are six other optimal
policies for various ranges of R(s); Exercise 17.5 asks you to find them.
The careful balancing of risk and reward is a characteristic of MDPs that does not
arise in deterministic search problems; moreover, it is a characteristic of many real-world
decision problems. For this reason, MDPs have been studied in several fields, including
AI, operations research, economics, and control theory. Dozens of algorithms have been
proposed for calculating optimal policies. In sections 17.2 and 17.3 we describe two of the
most important algorithm families. First, however, we must complete our investigation of
utilities and policies for sequential decision problems.
17.1.1 Utilities over time
In the MDP example in Figure 17.1, the performance of the agent was measured by a sum of
rewards for the states visited. This choice of performance measure is not arbitrary, but it is
not the only possibility for the utility function on environment histories, which we write as
Uh([s0, s1, . . . , sn ]). Our analysis draws on multiattribute utility theory (Section 16.4) and
is somewhat technical; the impatient reader may wish to skip to the next section.
The first question to answer is whether there is a finite horizon or an infinite horizonFINITEHORIZON
INFINITEHORIZON for decision making. A finite horizon means that there is a fixed time N after which nothing
matters—the game is over, so to speak. Thus, Uh([s0, s1, . . . , sN + k ]) = Uh([s0, s1, . . . , sN ])
for all k > 0. For example, suppose an agent starts at (3,1) in the 4× 3 world of Figure 17.1,
and suppose that N = 3. Then, to have any chance of reaching the +1 state, the agent must
head directly for it, and the optimal action is to go Up. On the other hand, if N = 100,
then there is plenty of time to take the safe route by going Left. So, with a finite horizon,
Russel-Norvig Arificial Intelligence: A modern approach – Prentice Hall
648 Chapter 17. Making Complex Decisions
1 2 3
1
2
3 + 1
–1
4
–1
+1
R(s) < –1.6284
(a) (b)
– 0.0221 < R(s) < 0
–1
+1
–1
+1
–1
+1
R(s) > 0
– 0.4278 < R(s) < – 0.0850
Figure 17.2 (a) An optimal policy for the stochastic environment with R(s) = − 0.04 in
the nonterminal states. (b) Optimal policies for four different ranges of R(s).
and (3,3) are as shown, every policy is optimal, and the agent obtains infinite total reward be-
cause it never enters a terminal state. Surprisingly, it turns out that there are six other optimal
policies for various ranges of R(s); Exercise 17.5 asks you to find them.
The careful balancing of risk and reward is a characteristic of MDPs that does not
arise in deterministic search problems; moreover, it is a characteristic of many real-world
decision problems. For this reason, MDPs have been studied in several fields, including
AI, operations research, economics, and control theory. Dozens of algorithms have been
proposed for calculating optimal policies. In sections 17.2 and 17.3 we describe two of the
most important algorithm families. First, however, we must complete our investigation of
utilities and policies for sequential decision problems.
17.1.1 Utilities over time
In the MDP example in Figure 17.1, the performance of the agent was measured by a sum of
rewards for the states visited. This choice of performance measure is not arbitrary, but it is
not the only possibility for the utility function on environment histories, which we write as
Uh([s0, s1, . . . , sn ]). Our analysis draws on multiattribute utility theory (Section 16.4) and
is somewhat technical; the impatient reader may wish to skip to the next section.
The first question to answer is whether there is a finite horizon or an infinite horizonFINITEHORIZON
INFINITEHORIZON for decision making. A finite horizon means that there is a fixed time N after which nothing
matters—the game is over, so to speak. Thus, Uh([s0, s1, . . . , sN + k ]) = Uh([s0, s1, . . . , sN ])
for all k > 0. For example, suppose an agent starts at (3,1) in the 4× 3 world of Figure 17.1,
and suppose that N = 3. Then, to have any chance of reaching the +1 state, the agent must
head directly for it, and the optimal action is to go Up. On the other hand, if N = 100,
then there is plenty of time to take the safe route by going Left. So, with a finite horizon,
0.8
0.1
0.1
uncertain system
26-28 November
Santa Fe, Argentina
Deep RL
• Q-learning does not converge
when the number of states is too
large (e.g., 1020)
• Deep RL introduced by DeepMind
• Using a Deep Neural Network
(DNN) to predict the Q-values for
a given state
States Actions
26-28 November
Santa Fe, Argentina
Applying Deep RL to MEC LTE scenarios
machine learning frameworks like: TensorFlow, Theano and
others. The main feature of this library is its simplicity, that
allows to build complex neural network topologies with just
a few lines of code in a scikit-lear n fashion, but keeping
at the same time the power of the neural network engine
that runs underneath. Using Keras, we built a feedforward
fully connected deep neural network composed of n hidden
layers in between the input layer whose dimension is given
by the cardinality kT k of the state t-uple, and the output
layer whose dimension is given by the cardinality kZ k of
the action set. In order to integrate the two systems which
run respectively on Python and on C++ environments, we
implemented a mechanism to let them communicate using
text files. With reference to Fig. 2, the deep RL engine
waits for the generation of the files containing the current
state of the system, once it receives the data it generates as
output a text filewhich contains the action to execute on the
simulator. On theOMNeT++ side, weused an asynchronous
timer whichchecksperiodically for theaction fileavailability,
assoon asthefile isavailable, thesimulator isableto read the
action code and change the server destination address for the
UEs that are running the application indicated in the action
thus emulating the data migration of the application from a
server to another. After the action execution, the RL agent
observes the reward obtained as the combination of several
performance indexes provided by the OMNeT++ simulator
by checking if the action performed has increased it or not.
The reward in this sense is used by the agent as a feedback
which helps it to understand if the action executed is a valid
choice in that specific system state.
5. RESULTS
In thissection, wepresent apreliminary scenario that webuilt
to test thefeasibility of thesystem whereweonly consider the
presence of MEC servers without the possibility to use the
Cloud. Fig.4 showsthestructureof thenetwork composed of
three eNBs aset of devices with K = 9, aset of MEC servers
with N = 3, a set of applications with M = 3, and a set of
actions with Z = kN k · kM k where each action corresponds
to themigration of an app taken from theAppsset to aserver
taken from the MEC set. The datarate connection provided
by the cables which connect the eNBs is equal to 10 Gbps
except for theonesthat connect therouters to thePGW where
the datarate is 3 Mbps to emulate a traffic congestion, thus
creating areal scenario wherewecan test theperformanceof
our algorithm. On the OMNeT++ side, it is possible to set
several parameters for thesimulation by using aconfiguration
filecalled omnetpp.ini ; since thenumber of parameters to set
is very large, we synthesized them in a table.
Table 1 shows themain parameters weset for the simulation,
weconsider atotal number of nineuserswho follow arandom
mobility motion pattern moving at speed equal to 1.5 mps
which is a fairly good approximation for the human walking
speed. With respect to the applications, we consider three
constant bit rate applications (CBR) that can be run by only
oneMEC server per time. Asalready said at thebeginning of
this section, our goal is to test the feasibility of the technique
Configuration Parameters
Number o f user s 9
User mobi l i ty RandomWayPointMobi l i ty
User speed 1.5 mps
Number o f appl ications 3
Appl ication type UDP ConstantBitRate
Packet si ze 1500B
Simulation Time 420 seconds
Table 1 – omnetpp.ini configuration.
we are proposing, for this reason, we decided to realize a
scenario with manageable number of users and applications
in order to keep the training time of the DNN not too high.
In such a context, the state which defines the network can be
expressed as follows:
Figure 4 –OMNeT++/SimuLTE simulation scenario.
s =< (UEeN B1,UEeN B2,UEeN B3,
eNB1app1, eNB1
app2, eNB1app3,
eNB2app1, eNB2
app2, eNB2app3,
eNB3app1, eNB3
app2, eNB3app3,
Mec1app1, Mec1
app2, Mec1app3,
Mec2app1, Mec2
app2, Mec2app3,
Mec3app1, Mec3
app2, Mec3app3) >
(10)
where:
• UEeN B j represents the number of devices connected to
the j-th eNB;
• eNBj
appkrepresents the number of devices which are
running the k-th application in the j-th eNB;
• MECiappk
isaboolean flag that indicates if the i-thMEC
server is running the k-th application.
With respect to the reward, we first defined as a
QoS performance index the percentage of received data
corresponding to the i-th application appi as:
Dappi =ReceivedT H R
ÕSentT H R · packetSize
(11)
where the sum is extended to all the UEs that run the i-th
application. In particular, we evaluated the average of the
machine learning frameworks like: TensorFlow, Theano and
others. The main feature of this library is its simplicity, that
allows to build complex neural network topologies with just
a few lines of code in a scikit-learn fashion, but keeping
at the same time the power of the neural network engine
that runs underneath. Using Keras, we built a feedforward
fully connected deep neural network composed of n hidden
layers in between the input layer whose dimension is given
by the cardinality kT k of the state t-uple, and the output
layer whose dimension is given by the cardinality kZ k of
the action set. In order to integrate the two systems which
run respectively on Python and on C++ environments, we
implemented a mechanism to let them communicate using
text files. With reference to Fig. 2, the deep RL engine
waits for the generation of the files containing the current
state of the system, once it receives the data it generates as
output a text filewhich contains the action to execute on the
simulator. On theOMNeT++ side, weused an asynchronous
timer whichchecksperiodically for theaction fileavailability,
assoon asthefile isavailable, thesimulator isableto read the
action code and change theserver destination address for the
UEs that are running the application indicated in the action
thus emulating the data migration of the application from a
server to another. After the action execution, the RL agent
observes the reward obtained as the combination of several
performance indexes provided by the OMNeT++ simulator
by checking if the action performed has increased it or not.
The reward in this sense is used by the agent as a feedback
which helps it to understand if the action executed is a valid
choice in that specific system state.
5. RESULTS
In thissection, wepresent apreliminary scenario that webuilt
to test thefeasibility of thesystem whereweonly consider the
presence of MEC servers without the possibility to use the
Cloud. Fig.4 showsthestructureof thenetwork composed of
threeeNBsaset of devices with K = 9, aset of MEC servers
with N = 3, a set of applications with M = 3, and a set of
actions with Z = kN k · kM k whereeach action corresponds
to themigration of an app taken from theAppsset to aserver
taken from the MEC set. The datarate connection provided
by the cables which connect the eNBs is equal to 10 Gbps
except for theonesthat connect theroutersto thePGW where
the datarate is 3 Mbps to emulate a traffic congestion, thus
creating areal scenario wherewecan test theperformanceof
our algorithm. On the OMNeT++ side, it is possible to set
several parameters for thesimulation by using aconfiguration
filecalled omnetpp.ini ; since thenumber of parameters to set
isvery large, wesynthesized them in a table.
Table1 showsthemain parameters weset for thesimulation,
weconsider atotal number of nineuserswho follow arandom
mobility motion pattern moving at speed equal to 1.5 mps
which is a fairly good approximation for the human walking
speed. With respect to the applications, we consider three
constant bit rate applications (CBR) that can be run by only
oneMEC server per time. Asalready said at thebeginning of
this section, our goal is to test the feasibility of the technique
Configuration Parameters
Number of user s 9
User mobi l ity RandomWayPointMobil i ty
User speed 1.5 mps
Number of appl ications 3
Appl ication type UDP ConstantBitRate
Packet si ze 1500B
Simulation Time 420 seconds
Table 1 – omnetpp.ini configuration.
we are proposing, for this reason, we decided to realize a
scenario with manageable number of users and applications
in order to keep the training time of theDNN not too high.
In such acontext, the state which defines the network can be
expressed as follows:
Figure4 –OMNeT++/SimuLTE simulation scenario.
s =< (UEeN B1,UEeN B2,UEeN B3,
eNB1app1, eNB1
app2, eNB1app3,
eNB2app1, eNB2
app2, eNB2app3,
eNB3app1, eNB3
app2, eNB3app3,
Mec1app1, Mec1
app2, Mec1app3,
Mec2app1, Mec2
app2, Mec2app3,
Mec3app1, Mec3
app2, Mec3app3) >
(10)
where:
• UEeN B jrepresents the number of devices connected to
the j-th eNB;
• eNBj
appkrepresents the number of devices which are
running the k-th application in the j-th eNB;
• MECiappk
isabooleanflag that indicates if the i-thMEC
server is running the k-th application.
With respect to the reward, we first defined as a
QoS performance index the percentage of received data
corresponding to the i-th application appi as:
Dappi =ReceivedT H R
ÕSentT H R · packetSize
(11)
where the sum is extended to all the UEs that run the i-th
application. In particular, we evaluated the average of the
State user position and app distribution
Actions app migration Reward combinationof network performance indexes
machine learning frameworks like: TensorFlow, Theano and
others. Themain feature of this library is its simplicity, that
allows to build complex neural network topologies with just
a few lines of code in a scikit-learn fashion, but keeping
at the same time the power of the neural network engine
that runs underneath. Using Keras, we built a feedforward
fully connected deep neural network composed of n hidden
layers in between the input layer whose dimension is given
by the cardinality kT k of the state t-uple, and the output
layer whose dimension is given by the cardinality kZ k of
the action set. In order to integrate the two systems which
run respectively on Python and on C++ environments, we
implemented a mechanism to let them communicate using
text files. With reference to Fig. 2, the deep RL engine
waits for the generation of the files containing the current
state of the system, once it receives the data it generates as
output a text filewhich contains theaction to execute on the
simulator. On theOMNeT++ side, weused an asynchronous
timer whichchecksperiodically for theactionfileavailability,
assoonasthefileisavailable, thesimulator isabletoreadthe
action codeand changetheserver destination address for the
UEs that are running the application indicated in the action
thus emulating the data migration of the application from a
server to another. After the action execution, the RL agent
observes the reward obtained as the combination of several
performance indexes provided by the OMNeT++ simulator
by checking if the action performed has increased it or not.
The reward in this sense is used by the agent as a feedback
which helps it to understand if theaction executed isavalid
choice in that specific system state.
5. RESULTS
Inthissection, wepresent apreliminary scenariothat webuilt
totest thefeasibility of thesystemwhereweonly consider the
presence of MEC servers without the possibility to use the
Cloud. Fig.4showsthestructureof thenetwork composed of
threeeNBsaset of deviceswith K = 9, aset of MECservers
with N = 3, a set of applications with M = 3, and a set of
actionswith Z = kN k ·kM kwhereeachaction corresponds
to themigration of anapp taken fromtheAppsset toaserver
taken from the MEC set. The datarate connection provided
by the cables which connect the eNBs is equal to 10 Gbps
except for theonesthat connect therouterstothePGWwhere
the datarate is 3 Mbps to emulate a traffic congestion, thus
creating areal scenariowherewecan test theperformanceof
our algorithm. On the OMNeT++ side, it is possible to set
several parametersfor thesimulation by usingaconfiguration
filecalled omnetpp.ini; sincethenumber of parameters toset
isvery large, wesynthesized them in atable.
Table1showsthemain parametersweset for thesimulation,
weconsider atotal number of nineuserswhofollowarandom
mobility motion pattern moving at speed equal to 1.5 mps
which isa fairly good approximation for thehuman walking
speed. With respect to the applications, we consider three
constant bit rate applications (CBR) that can be run by only
oneMECserver per time. Asalready saidat thebeginningof
thissection, our goal is to test thefeasibility of thetechnique
ConfigurationParameters
Number of users 9
User mobility RandomWayPointMobil ity
User speed 1.5 mps
Number of applications 3
Application type UDP ConstantBitRate
Packet size 1500B
SimulationTime 420 seconds
Table1–omnetpp.ini configuration.
we are proposing, for this reason, we decided to realize a
scenario with manageable number of users and applications
in order to keep thetraining timeof theDNN not too high.
In such acontext, thestatewhichdefines thenetwork can be
expressed asfollows:
Figure4–OMNeT++/SimuLTE simulation scenario.
s =< (UEeN B1,UEeN B2,UEeN B3,
eNB1app1,eNB1
app2, eNB1app3,
eNB2app1,eNB2
app2, eNB2app3,
eNB3app1,eNB3
app2, eNB3app3,
Mec1app1, Mec1
app2, Mec1app3,
Mec2app1, Mec2
app2, Mec2app3,
Mec3app1, Mec3
app2, Mec3app3) >
(10)
where:
• UEeN B jrepresents thenumber of devicesconnected to
the j-th eNB;
• eNBj
appkrepresents the number of devices which are
running thek-th application in the j-th eNB;
• MECiappk
isabooleanflag that indicatesif thei-thMEC
server isrunning thek-th application.
With respect to the reward, we first defined as a
QoS performance index the percentage of received data
corresponding to the i-th application appi as:
Dappi =ReceivedTH R
ÕSentTH R ·packetSize
(11)
where the sum is extended to all the UEs that run the i-th
application. In particular, we evaluated the average of the
set of MEC/Cloud servers
set of Applications
machine learning frameworks like: TensorFlow, Theano and
others. The main feature of this library is its simplicity, that
allows to build complex neural network topologies with just
a few lines of code in a scikit-learn fashion, but keeping
at the same time the power of the neural network engine
that runs underneath. Using Keras, we built a feedforward
fully connected deep neural network composed of n hidden
layers in between the input layer whose dimension is given
by the cardinality kT k of the state t-uple, and the output
layer whose dimension is given by the cardinality kZ k of
the action set. In order to integrate the two systems which
run respectively on Python and on C++ environments, we
implemented a mechanism to let them communicate using
text files. With reference to Fig. 2, the deep RL engine
waits for the generation of the files containing the current
state of the system, once it receives the data it generates as
output a text filewhich contains the action to execute on the
simulator. On theOMNeT++ side, weused an asynchronous
timer whichchecksperiodically for theaction fileavailability,
assoon asthefile isavailable, thesimulator isableto read the
action code and change the server destination address for the
UEs that are running the application indicated in the action
thus emulating the data migration of the application from a
server to another. After the action execution, the RL agent
observes the reward obtained as the combination of several
performance indexes provided by the OMNeT++ simulator
by checking if the action performed has increased it or not.
The reward in this sense is used by the agent as a feedback
which helps it to understand if the action executed is a valid
choice in that specific system state.
5. RESULTS
In thissection, wepresent apreliminary scenario that webuilt
to test thefeasibility of thesystem whereweonly consider the
presence of MEC servers without the possibility to use the
Cloud. Fig.4 showsthestructureof thenetwork composed of
three eNBs aset of devices with K = 9, aset of MEC servers
with N = 3, a set of applications with M = 3, and a set of
actions with Z = kN k · kM k where each action corresponds
to themigration of an app taken from theAppsset to aserver
taken from the MEC set. The datarate connection provided
by the cables which connect the eNBs is equal to 10 Gbps
except for theonesthat connect theroutersto thePGW where
the datarate is 3 Mbps to emulate a traffic congestion, thus
creating areal scenario wherewecan test theperformanceof
our algorithm. On the OMNeT++ side, it is possible to set
several parameters for thesimulation by using aconfiguration
filecalled omnetpp.ini ; since thenumber of parameters to set
is very large, we synthesized them in a table.
Table 1 showsthe main parameters weset for the simulation,
weconsider atotal number of nineuserswho follow arandom
mobility motion pattern moving at speed equal to 1.5 mps
which is a fairly good approximation for the human walking
speed. With respect to the applications, we consider three
constant bit rate applications (CBR) that can be run by only
oneMEC server per time. Asalready said at thebeginning of
this section, our goal is to test the feasibility of the technique
Configuration Parameters
Number o f user s 9
User mobi l i ty RandomWayPointMobi l i ty
User speed 1.5 mps
Number o f appl ications 3
Appl ication type UDP ConstantBitRate
Packet si ze 1500B
Simulation Time 420 seconds
Table 1 – omnetpp.ini configuration.
we are proposing, for this reason, we decided to realize a
scenario with manageable number of users and applications
in order to keep the training time of the DNN not too high.
In such a context, the state which defines the network can be
expressed as follows:
Figure 4 –OMNeT++/SimuLTE simulation scenario.
s =< (UEeN B1,UEeN B2,UEeN B3,
eNB1app1, eNB1
app2, eNB1app3,
eNB2app1, eNB2
app2, eNB2app3,
eNB3app1, eNB3
app2, eNB3app3,
Mec1app1, Mec1
app2, Mec1app3,
Mec2app1, Mec2
app2, Mec2app3,
Mec3app1, Mec3
app2, Mec3app3) >
(10)
where:
• UEeN B jrepresents the number of devices connected to
the j-th eNB;
• eNBj
appkrepresents the number of devices which are
running the k-th application in the j-th eNB;
• MECiappk
isaboolean flag that indicates if the i-th MEC
server is running the k-th application.
With respect to the reward, we first defined as a
QoS performance index the percentage of received data
corresponding to the i-th application appi as:
Dappi =ReceivedT H R
ÕSentT H R · packetSi ze
(11)
where the sum is extended to all the UEs that run the i-th
application. In particular, we evaluated the average of the
machine learning frameworks like: TensorFlow, Theano and
others. The main feature of this library is its simplicity, that
allows to build complex neural network topologies with just
a few lines of code in a scikit-lear n fashion, but keeping
at the same time the power of the neural network engine
that runs underneath. Using Keras, we built a feedforward
fully connected deep neural network composed of n hidden
layers in between the input layer whose dimension is given
by the cardinality kT k of the state t-uple, and the output
layer whose dimension is given by the cardinality kZ k of
the action set. In order to integrate the two systems which
run respectively on Python and on C++ environments, we
implemented a mechanism to let them communicate using
text files. With reference to Fig. 2, the deep RL engine
waits for the generation of the files containing the current
state of the system, once it receives the data it generates as
output a text filewhich contains the action to execute on the
simulator. On the OMNeT++ side, weused an asynchronous
timer whichchecksperiodically for theaction fileavailability,
assoon asthefile isavailable, thesimulator isable to read the
action code and change the server destination address for the
UEs that are running the application indicated in the action
thus emulating the data migration of the application from a
server to another. After the action execution, the RL agent
observes the reward obtained as the combination of several
performance indexes provided by the OMNeT++ simulator
by checking if the action performed has increased it or not.
The reward in this sense is used by the agent as a feedback
which helps it to understand if the action executed is a valid
choice in that specific system state.
5. RESULTS
In thissection, wepresent apreliminary scenario that webuilt
to test thefeasibility of thesystem whereweonly consider the
presence of MEC servers without the possibility to use the
Cloud. Fig.4 showsthestructure of thenetwork composed of
three eNBs aset of devices with K = 9, aset of MEC servers
with N = 3, a set of applications with M = 3, and a set of
actions with Z = kN k · kM k where each action corresponds
to themigration of an app taken from the Appsset to aserver
taken from the MEC set. The datarate connection provided
by the cables which connect the eNBs is equal to 10 Gbps
except for theonesthat connect therouters to thePGW where
the datarate is 3 Mbps to emulate a traffic congestion, thus
creating areal scenario wherewecan test theperformance of
our algorithm. On the OMNeT++ side, it is possible to set
several parameters for thesimulation by using aconfiguration
filecalled omnetpp.ini ; since thenumber of parameters to set
is very large, we synthesized them in a table.
Table 1 shows the main parameters weset for the simulation,
weconsider atotal number of nineuserswho follow arandom
mobility motion pattern moving at speed equal to 1.5 mps
which is a fairly good approximation for the human walking
speed. With respect to the applications, we consider three
constant bit rate applications (CBR) that can be run by only
oneMEC server per time. Asalready said at thebeginning of
this section, our goal is to test the feasibility of the technique
Configuration Parameters
Number o f user s 9
User mobi l i ty RandomWayPointMobi l i ty
User speed 1.5 mps
Number o f appl ications 3
Appl ication type UDP ConstantBitRate
Packet si ze 1500B
Simulation Time 420 seconds
Table 1 – omnetpp.ini configuration.
we are proposing, for this reason, we decided to realize a
scenario with manageable number of users and applications
in order to keep the training time of the DNN not too high.
In such a context, the state which defines the network can be
expressed as follows:
Figure 4 –OMNeT++/SimuLTE simulation scenario.
s =< (UEeN B1,UEeN B2,UEeN B3,
eNB1app1, eNB1
app2, eNB1app3,
eNB2app1, eNB2
app2, eNB2app3,
eNB3app1, eNB3
app2, eNB3app3,
Mec1app1, Mec1
app2, Mec1app3,
Mec2app1, Mec2
app2, Mec2app3,
Mec3app1, Mec3
app2, Mec3app3) >
(10)
where:
• UEeN B j represents the number of devices connected to
the j-th eNB;
• eNBj
appkrepresents the number of devices which are
running the k-th application in the j-th eNB;
• MECiappk
isaboolean flag that indicates if the i-th MEC
server is running the k-th application.
With respect to the reward, we first defined as a
QoS performance index the percentage of received data
corresponding to the i-th application appi as:
Dappi =ReceivedT H R
ÕSentT H R · packetSi ze
(11)
where the sum is extended to all the UEs that run the i-th
application. In particular, we evaluated the average of the
Percentage of receiveddata
26-28 November
Santa Fe, Argentina
The proposed algorithm
Init
actionselection
actionexecution
DNN training
26-28 November
Santa Fe, Argentina
Creating a Deep RL environment for MEC
26-28 November
Santa Fe, Argentina
MEC-LTE environment
• OMNeT++– iNet
– SimuLTE MEC extension
26-28 November
Santa Fe, Argentina
Deep RL engine
• Keras on top of TensorFlow
– build complex neural network
topologies with just a few lines of
code
– keeping the power of the neural
network engine that runs underneath
OMNeT++ (C++) and Keras (Python) have been integrated by implementing a mechanism to let them communicate using text files
26-28 November
Santa Fe, Argentina
Experimental results
• 3 eNBs
• 9 UE
• 3 MEC Servers
• 3 Applications (CBR)
• Random walk
(walking speed)
• Training for 25,000
simulation seconds 0
0.2
0.4
0.6
0.8
1
100 200 300 400
Pe
rce
nta
ge o
f re
ce
ive
d d
ata
- D
Simulation time (sec)
No policyDeep RL
The Deep RL algorithm is ableto promptlyreact to wrongactions
The Deep RL algorithmoutperforms the «static» policy
Comparison with a «static» policy where no App migration is performed
26-28 November
Santa Fe, Argentina
Conclusions and Future Work
• We presented a machine learning approach to address the problem related to
the network environment dynamics in a 5G MEC-enabled LTE scenario
• We designed a Deep RL algorithm and tested it in a real scenario demonstrating
the feasibility of the technique
• Future works will be devoted to:
– better integration between OMNeT++/SimuLTE and Keras/TensorFlow
– analysis of more complex scenarios
– comparison with other solutions
Thank you