17
Research Article Intelligent Ramp Control for Incident Response Using Dyna- Architecture Chao Lu, 1,2 Yanan Zhao, 1 and Jianwei Gong 1 1 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China 2 Institute for Transport Studies, University of Leeds, Leeds LS2 9JT, UK Correspondence should be addressed to Chao Lu; [email protected] Received 18 June 2015; Revised 22 September 2015; Accepted 28 September 2015 Academic Editor: Dongsuk Kum Copyright © 2015 Chao Lu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based on -learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna- based multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna- is an extension of -learning, which combines model- free and model-based methods to obtain benefits from both sides. e performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. e test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO 2 emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps. 1. Introduction Traffic congestion occurs when the traffic demand for a road network approaches or exceeds its available road capacity. Even slight losses of the balance between demand and capacity on motorways can lead to long travel delays, high energy consumptions, and severe environmental problems. erefore, how to alleviate traffic congestion and maintain the demand-capacity balance has become one of the main concerns of the transport community. To this end, a number of traffic control devices, such as variable speed limit (VSL), variable message sign (VMS), and ramp control systems, are developed under the umbrella of intelligent transportation systems (ITS). Among these advanced systems, ramp control (also known as ramp metering) has been widely used and proved to be an effective control method for different kinds of congestion on motorways [1]. Generally, traffic congestion can be classified into two categories: recurrent congestion and nonrecurrent conges- tion. Recurrent congestion is caused by the daily traffic operation with temporarily increased traffic demand in peak hours [2]. Considering the daily peak traffic on motorways, recurrent congestion is the main concern of many existing ramp control systems. For instance, fixed-time systems (also known as pretimed systems) use historical data collected from daily peak hours to generate control strategies offline and trigger these strategies at fixed times (e.g., morning or evening peak hours) of each day [1]. Local traffic-responsive systems such as demand-capacity method, ALINEA [3], and its variations [4] can respond to the real-time traffic and keep the outflow or road density of the motorway mainline close to some target value (e.g., road capacity or critical density). Usually, these target values should be defined in advance according to the so-called fundamental diagram which is derived from the daily traffic data. To deal with network-wide problems, traffic-responsive systems have been extended to coordinated ramp control systems, such as Flow [5], System Wide Adaptive Ramp Metering (SWARM) [6], and Zone algorithms [7]. Similar to local traffic-responsive systems, these coordinated systems also attempt to make Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 896943, 16 pages http://dx.doi.org/10.1155/2015/896943

Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Research ArticleIntelligent Ramp Control for Incident ResponseUsing Dyna-119876 Architecture

Chao Lu12 Yanan Zhao1 and Jianwei Gong1

1School of Mechanical Engineering Beijing Institute of Technology Beijing 100081 China2Institute for Transport Studies University of Leeds Leeds LS2 9JT UK

Correspondence should be addressed to Chao Lu tscllugmailcom

Received 18 June 2015 Revised 22 September 2015 Accepted 28 September 2015

Academic Editor Dongsuk Kum

Copyright copy 2015 Chao Lu et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Reinforcement learning (RL) has shown great potential for motorway ramp control especially under the congestion caused byincidents However existing applications limited to single-agent tasks and based on119876-learning have inherent drawbacks for dealingwith coordinated ramp control problems For solving these problems a Dyna-119876 based multiagent reinforcement learning (MARL)system named Dyna-MARL has been developed in this paper Dyna-119876 is an extension of 119876-learning which combines model-free and model-based methods to obtain benefits from both sides The performance of Dyna-MARL is tested in a simulatedmotorway segment in the UK with the real traffic data collected from AM peak hours The test results compared with IsolatedRL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operationwith respect to increasing total throughput reducing total travel time and CO

2emission Moreover with a suitable coordination

strategy Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from differenton-ramps

1 Introduction

Traffic congestion occurs when the traffic demand for a roadnetwork approaches or exceeds its available road capacityEven slight losses of the balance between demand andcapacity on motorways can lead to long travel delays highenergy consumptions and severe environmental problemsTherefore how to alleviate traffic congestion and maintainthe demand-capacity balance has become one of the mainconcerns of the transport community To this end a numberof traffic control devices such as variable speed limit (VSL)variable message sign (VMS) and ramp control systems aredeveloped under the umbrella of intelligent transportationsystems (ITS) Among these advanced systems ramp control(also known as ramp metering) has been widely used andproved to be an effective control method for different kindsof congestion on motorways [1]

Generally traffic congestion can be classified into twocategories recurrent congestion and nonrecurrent conges-tion Recurrent congestion is caused by the daily traffic

operation with temporarily increased traffic demand in peakhours [2] Considering the daily peak traffic on motorwaysrecurrent congestion is the main concern of many existingramp control systems For instance fixed-time systems (alsoknown as pretimed systems) use historical data collectedfrom daily peak hours to generate control strategies offlineand trigger these strategies at fixed times (eg morning orevening peak hours) of each day [1] Local traffic-responsivesystems such as demand-capacity method ALINEA [3] andits variations [4] can respond to the real-time traffic andkeep the outflow or road density of the motorway mainlineclose to some target value (eg road capacity or criticaldensity) Usually these target values should be defined inadvance according to the so-called fundamental diagramwhich is derived from the daily traffic data To deal withnetwork-wide problems traffic-responsive systems have beenextended to coordinated ramp control systems such as Flow[5] System Wide Adaptive Ramp Metering (SWARM) [6]and Zone algorithms [7] Similar to local traffic-responsivesystems these coordinated systems also attempt to make

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 896943 16 pageshttpdxdoiorg1011552015896943

2 Mathematical Problems in Engineering

the outflow ofmotorwaymainline approach a predeterminedtarget value which is usually the road capacity Anothergroup of systems focuses on formulating different controlscenarios as optimisation problems and using optimal controltechniques (eg model predictive control) to solve themThe purpose of these systems is to maximise or minimisean objective function not to achieve some predefined targetvalue Examples of these systems can be found in [8ndash12]where macroscopic traffic flow models were combined withcontrol systems to formulate optimal control problems

Although the aforementioned systems have shown theireffectiveness in different scenarios recurrent congestion isstill themain focus of these systems and a component that candeal with nonrecurrent congestion is not included in thesesystems Unlike recurrent congestion caused by the increasedtraffic demand in peak hours nonrecurrent congestion ismainly induced by incidents and thus it is usually referredto as incident-induced congestion [2 13] Traffic incidentsare nonrecurrent events such as road accidents vehiclebreakdown and unexpected obstacles that may block one ormore lanes of the motorway mainline The temporary laneblockage will interrupt the normal operation of traffic flowand lead to a rapid reduction of road capacity [14] In thiscase fixed-time and simple traffic-responsive systems whichare dependent on the information collected from daily trafficoperation or a predefined target value are not applicableTherefore more sophisticated systems that can respond toincidents are required During the last decades a series ofsuch kinds of ramp control systems have been designedmost of which are based on optimisation techniques Forexample an optimal control structure using a simple macro-scopic traffic flow model was proposed in [15] to deal withincident-induced congestion A more complex system withconsideration of dynamic incident duration was developedin [16] which can be solved by the linear programmingtechnique In the research presented in [17 18] both lane-changing and queuing behaviour during the incident wereincorporated into a modelling structure and solved by astochastic optimal control system Although these systemsare based on different technologies they all need a modelto predict traffic conditions and use these predictions toaccomplish the control process

Model-based methods usually have poor adaptabilitywhen the mismatch between simulation models and the realcontrolled environment emerges [19ndash21] To overcome thislimitation another optimisation-based method reinforce-ment learning (RL) was introduced to the ramp control areaThis method is based on theMarkov decision process (MDP)and dynamic programming (DP) which can approximatelysolve the optimisation problem through continuous learningwithout any models The first ramp control system using RLto solve incident-induced problemswas developed in [19 22]The basic RL algorithm named 119876-learning was adopted bythis system to alleviate traffic congestion caused by incidentsAfter this work several119876-learning systems considering bothlocal (eg [23 24]) and coordinated (eg [25 26]) controlproblems were proposed However 119876-learning can onlylearn from real interactions with the traffic operation andcannot make full use of historical data (or models) Because

of this limitation 119876-learning usually has a low learningspeed and needs a great number of trials to obtain the bestcontrol strategy in some complex scenarios such as incident-induced congestion [27] This problem is even worse inthe coordinated ramp control problems with exponentiallyincreased state and action spaces which will lead to the so-called ldquocurse of dimensionalityrdquo [28] One solution to speedup the learning process and deal with incidents efficiently hasbeen proposed in our previous work [27 29] This systemused the Dyna-119876 architecture to combine model-free 119876-learning with a model-based method and can be used toaccomplish single-agent tasks

In this paper the previous single-agent system is extendedto a multiagent case that can deal with a network-wideproblem with multiple ramp controllers We refer this systemto Dyna-MARL which adopts a multiagent RL (MARL)strategy based on Dyna-119876 architectureThe rest of this paperis organised as follows Section 2 briefly introduces the basicknowledge of RL including single-agent andmultiagent casesThe architecture of Dyna-MARL is described in Section 3After that Sections 4 and 5 give the detailed description ofthe models elements and related algorithm of Dyna-MARLThe simulation experiments and relevant results are discussedin Section 6 Section 7 finally gives some conclusions andintroduces the future work

2 Reinforcement Learning

RL is a subclass of machine learning In the followingsubsections two kinds of RL problems namely single-agentand multiagent RL will be briefly introduced

21 Single-Agent RL The problem of single-agent RL isusually defined as an MDP that can be represented by atuple (119878 119875 119877 119862) [30] 119878 is the state space used to describe theexternal environment 119862 is the control action set containingexecutable actions of the agent 119875 is the state transitionprobability For state pair (119904 1199041015840 isin 119878) 119875119888(119904 1199041015840) represents theprobability of reaching state 1199041015840 after executing action 119888 at state119904 119877 119878 times 119862 rarr R is the reward function 119877(119904 119888) denotes theimmediate reward after taking action 119888 at state 119904 Based onthese definitions119876 value is defined for each state-action pair(119904 119888) and shown below

119876120587(119904 119888)

= 119864

infin

sum119899=0

120574119899119877 (119904119896+119899+1

119888119896+119899+1

) | 119904119896= 119904 119888

119896= 119888

(1)

where 119896 is the time index and 119899 is the number of time steps119904119896isin 119878 and 119888119896 isin 119862 are the environment state and executed

control action at time step 119896 respectively 120574 isin [0 1] isthe discount factor which indicates the importance of thefollowing predicted rewards For 120574119899 119899 is the power 120587 is thepolicy corresponding to a sequence of actions The optimalpolicy can be obtained by maximising the 119876 value

The most widely used algorithm in literature for esti-mating the maximum 119876 value is 119876-learning [31] By using

Mathematical Problems in Engineering 3

the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair

119876119896+1(119904119896 119888119896) = 119876

119896(119904119896 119888119896) + 120572 [119877

119896(119904119896 119888119896)

+ 120574max119888119896+1

119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]

(2)

where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574

and 120572 can be regulated according to different problems

22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]

In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents

For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]

119876119896+1(119904119896 119888119896

1 119888

119896

119899) = 119876

119896(119904119896 119888119896

1 119888

119896

119899)

+ 120572 [119877119896(119904119896 119888119896

1 119888

119896

119899)

+ 120574 max119888119896+1

1119888119896+1119899

119876119896(119904119896+1 119888119896+1

1 119888

119896+1

119899)

minus 119876119896(119904119896 119888119896

1 119888

119896

119899)]

(3)

The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888

1 119888

119899executed by 119899 agents rather than to a single

action 119888

23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies

Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents

(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination

3 Dyna-119876 Based IndirectCoordination Strategy

Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture

Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced

31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)

In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process

Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy

4 Mathematical Problems in Engineering

Valuepolicy

ExperienceModel

Acting

Model learning

Planning

Direct RL

Figure 1 Dyna-119876 architecture

Valuepolicy

Model

and estimated traffic arrival and

Experience

Direct RL Planning

Acting

Model learning

Age

nt ar

chite

ctur

eA

gent

coor

dina

tion

i i + 1 i + 2

j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8

Agent i + 1Agent i Agent i + 2

middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot

dkjoff dkj+3offdkj+6off

dkj+2ondkj+5on dkj+8on

akj+2onakj+5on akj+8on

dkjmain dkj+8maindkj+2main

akj+7mainakj+1main

akj+8main

Q-valueQk+1i+1 (s

ki+1 c

ki c

ki+1)

dkj+3main dkj+3off akj+5main

(ii) Info from i cki qkion d

kjmain d

kjoff

(i) Real traffic arrival and departure rates

cki qkion

dkjmain dkjoff

cki+1 qki+1on

dkj+3main dkj+3off

departure rates of section i and i + 1

(i) ACTM

(ii) Action probability p(cki | ski+1)

Figure 2 System architecture

Dyna-119876 can learn faster than 119876-learning in many situations[30]

Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]

32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section

A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell

The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example

Mathematical Problems in Engineering 5

Incident extent

Critical section

(a)

Flow

Density0

dmaxjmain

dInmaxjmain wj

wInj

120588InjC 120588jC 120588In

jJ120588jJ

j

Inj

(b)

Figure 3 Fundamental diagram during the incident

experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess

To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894

Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)

If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by

119876119896+1

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) = 119876

119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)

+ 120572[

[

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) + 120574max

119888119896+1

119894

sum

119888119896+1

119894minus1

119901 (119888119896+1

119894minus1| 119904119896+1

119894)

sdot 119876119896

119894(119904119896+1

119894 119888119896+1

119894minus1 119888119896+1

119894) minus 119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)]

]

119901 (119888119896+1

119894minus1| 119904119896+1

119894) =

count (119904119896+1119894 119888119896+1

119894minus1)

sum119888119894minus1isin119862119894minus1

count (119904119896+1119894 119888119894minus1)

(4)

where 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is the immediate reward obtained by

agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896

119894are actions

executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) and

119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) are the 119876 values for agent 119894 at step 119896 + 1 and

step 119896 respectively 119862119894minus1

is the action set of agent 119894 minus 1count(119904119896+1

119894 119888119896+1

119894minus1) returns the number of visits for state-action

pair (119904119896+1119894 119888119896+1

119894minus1) Thus 119901(119888119896+1

119894minus1| 119904119896+1

119894) is the probability for

agent 119894 minus 1 selecting action 119888119896+1119894minus1

at state 119904119896+1119894

Models and therelated symbols shown in Figure 2 will be specified in theflowing section

4 Modified Asymmetric CellTransmission Model

A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions

41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582

1 1205822 1205823isin [0 1])

to reflect this new dynamics These three parameters aredefined as 120582

1= VIn119895V119895 1205822= 119908

In119895119908119895 and 120582

3= 119889

Inmax119895main 119889

max119895main

V119895and 119908

119895are the free flow speed and congestion wave speed

of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn

119895

119908In119895 and 119889Inmax

119895main are these three variables during the incident120588119895119862

and 120588In119895119862

are the critical densities for normal and incidentsituations 120588

119895119869and 120588In

119895119869are the jam densities for normal and

incident situations

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

2 Mathematical Problems in Engineering

the outflow ofmotorwaymainline approach a predeterminedtarget value which is usually the road capacity Anothergroup of systems focuses on formulating different controlscenarios as optimisation problems and using optimal controltechniques (eg model predictive control) to solve themThe purpose of these systems is to maximise or minimisean objective function not to achieve some predefined targetvalue Examples of these systems can be found in [8ndash12]where macroscopic traffic flow models were combined withcontrol systems to formulate optimal control problems

Although the aforementioned systems have shown theireffectiveness in different scenarios recurrent congestion isstill themain focus of these systems and a component that candeal with nonrecurrent congestion is not included in thesesystems Unlike recurrent congestion caused by the increasedtraffic demand in peak hours nonrecurrent congestion ismainly induced by incidents and thus it is usually referredto as incident-induced congestion [2 13] Traffic incidentsare nonrecurrent events such as road accidents vehiclebreakdown and unexpected obstacles that may block one ormore lanes of the motorway mainline The temporary laneblockage will interrupt the normal operation of traffic flowand lead to a rapid reduction of road capacity [14] In thiscase fixed-time and simple traffic-responsive systems whichare dependent on the information collected from daily trafficoperation or a predefined target value are not applicableTherefore more sophisticated systems that can respond toincidents are required During the last decades a series ofsuch kinds of ramp control systems have been designedmost of which are based on optimisation techniques Forexample an optimal control structure using a simple macro-scopic traffic flow model was proposed in [15] to deal withincident-induced congestion A more complex system withconsideration of dynamic incident duration was developedin [16] which can be solved by the linear programmingtechnique In the research presented in [17 18] both lane-changing and queuing behaviour during the incident wereincorporated into a modelling structure and solved by astochastic optimal control system Although these systemsare based on different technologies they all need a modelto predict traffic conditions and use these predictions toaccomplish the control process

Model-based methods usually have poor adaptabilitywhen the mismatch between simulation models and the realcontrolled environment emerges [19ndash21] To overcome thislimitation another optimisation-based method reinforce-ment learning (RL) was introduced to the ramp control areaThis method is based on theMarkov decision process (MDP)and dynamic programming (DP) which can approximatelysolve the optimisation problem through continuous learningwithout any models The first ramp control system using RLto solve incident-induced problemswas developed in [19 22]The basic RL algorithm named 119876-learning was adopted bythis system to alleviate traffic congestion caused by incidentsAfter this work several119876-learning systems considering bothlocal (eg [23 24]) and coordinated (eg [25 26]) controlproblems were proposed However 119876-learning can onlylearn from real interactions with the traffic operation andcannot make full use of historical data (or models) Because

of this limitation 119876-learning usually has a low learningspeed and needs a great number of trials to obtain the bestcontrol strategy in some complex scenarios such as incident-induced congestion [27] This problem is even worse inthe coordinated ramp control problems with exponentiallyincreased state and action spaces which will lead to the so-called ldquocurse of dimensionalityrdquo [28] One solution to speedup the learning process and deal with incidents efficiently hasbeen proposed in our previous work [27 29] This systemused the Dyna-119876 architecture to combine model-free 119876-learning with a model-based method and can be used toaccomplish single-agent tasks

In this paper the previous single-agent system is extendedto a multiagent case that can deal with a network-wideproblem with multiple ramp controllers We refer this systemto Dyna-MARL which adopts a multiagent RL (MARL)strategy based on Dyna-119876 architectureThe rest of this paperis organised as follows Section 2 briefly introduces the basicknowledge of RL including single-agent andmultiagent casesThe architecture of Dyna-MARL is described in Section 3After that Sections 4 and 5 give the detailed description ofthe models elements and related algorithm of Dyna-MARLThe simulation experiments and relevant results are discussedin Section 6 Section 7 finally gives some conclusions andintroduces the future work

2 Reinforcement Learning

RL is a subclass of machine learning In the followingsubsections two kinds of RL problems namely single-agentand multiagent RL will be briefly introduced

21 Single-Agent RL The problem of single-agent RL isusually defined as an MDP that can be represented by atuple (119878 119875 119877 119862) [30] 119878 is the state space used to describe theexternal environment 119862 is the control action set containingexecutable actions of the agent 119875 is the state transitionprobability For state pair (119904 1199041015840 isin 119878) 119875119888(119904 1199041015840) represents theprobability of reaching state 1199041015840 after executing action 119888 at state119904 119877 119878 times 119862 rarr R is the reward function 119877(119904 119888) denotes theimmediate reward after taking action 119888 at state 119904 Based onthese definitions119876 value is defined for each state-action pair(119904 119888) and shown below

119876120587(119904 119888)

= 119864

infin

sum119899=0

120574119899119877 (119904119896+119899+1

119888119896+119899+1

) | 119904119896= 119904 119888

119896= 119888

(1)

where 119896 is the time index and 119899 is the number of time steps119904119896isin 119878 and 119888119896 isin 119862 are the environment state and executed

control action at time step 119896 respectively 120574 isin [0 1] isthe discount factor which indicates the importance of thefollowing predicted rewards For 120574119899 119899 is the power 120587 is thepolicy corresponding to a sequence of actions The optimalpolicy can be obtained by maximising the 119876 value

The most widely used algorithm in literature for esti-mating the maximum 119876 value is 119876-learning [31] By using

Mathematical Problems in Engineering 3

the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair

119876119896+1(119904119896 119888119896) = 119876

119896(119904119896 119888119896) + 120572 [119877

119896(119904119896 119888119896)

+ 120574max119888119896+1

119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]

(2)

where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574

and 120572 can be regulated according to different problems

22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]

In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents

For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]

119876119896+1(119904119896 119888119896

1 119888

119896

119899) = 119876

119896(119904119896 119888119896

1 119888

119896

119899)

+ 120572 [119877119896(119904119896 119888119896

1 119888

119896

119899)

+ 120574 max119888119896+1

1119888119896+1119899

119876119896(119904119896+1 119888119896+1

1 119888

119896+1

119899)

minus 119876119896(119904119896 119888119896

1 119888

119896

119899)]

(3)

The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888

1 119888

119899executed by 119899 agents rather than to a single

action 119888

23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies

Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents

(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination

3 Dyna-119876 Based IndirectCoordination Strategy

Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture

Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced

31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)

In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process

Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy

4 Mathematical Problems in Engineering

Valuepolicy

ExperienceModel

Acting

Model learning

Planning

Direct RL

Figure 1 Dyna-119876 architecture

Valuepolicy

Model

and estimated traffic arrival and

Experience

Direct RL Planning

Acting

Model learning

Age

nt ar

chite

ctur

eA

gent

coor

dina

tion

i i + 1 i + 2

j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8

Agent i + 1Agent i Agent i + 2

middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot

dkjoff dkj+3offdkj+6off

dkj+2ondkj+5on dkj+8on

akj+2onakj+5on akj+8on

dkjmain dkj+8maindkj+2main

akj+7mainakj+1main

akj+8main

Q-valueQk+1i+1 (s

ki+1 c

ki c

ki+1)

dkj+3main dkj+3off akj+5main

(ii) Info from i cki qkion d

kjmain d

kjoff

(i) Real traffic arrival and departure rates

cki qkion

dkjmain dkjoff

cki+1 qki+1on

dkj+3main dkj+3off

departure rates of section i and i + 1

(i) ACTM

(ii) Action probability p(cki | ski+1)

Figure 2 System architecture

Dyna-119876 can learn faster than 119876-learning in many situations[30]

Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]

32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section

A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell

The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example

Mathematical Problems in Engineering 5

Incident extent

Critical section

(a)

Flow

Density0

dmaxjmain

dInmaxjmain wj

wInj

120588InjC 120588jC 120588In

jJ120588jJ

j

Inj

(b)

Figure 3 Fundamental diagram during the incident

experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess

To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894

Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)

If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by

119876119896+1

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) = 119876

119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)

+ 120572[

[

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) + 120574max

119888119896+1

119894

sum

119888119896+1

119894minus1

119901 (119888119896+1

119894minus1| 119904119896+1

119894)

sdot 119876119896

119894(119904119896+1

119894 119888119896+1

119894minus1 119888119896+1

119894) minus 119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)]

]

119901 (119888119896+1

119894minus1| 119904119896+1

119894) =

count (119904119896+1119894 119888119896+1

119894minus1)

sum119888119894minus1isin119862119894minus1

count (119904119896+1119894 119888119894minus1)

(4)

where 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is the immediate reward obtained by

agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896

119894are actions

executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) and

119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) are the 119876 values for agent 119894 at step 119896 + 1 and

step 119896 respectively 119862119894minus1

is the action set of agent 119894 minus 1count(119904119896+1

119894 119888119896+1

119894minus1) returns the number of visits for state-action

pair (119904119896+1119894 119888119896+1

119894minus1) Thus 119901(119888119896+1

119894minus1| 119904119896+1

119894) is the probability for

agent 119894 minus 1 selecting action 119888119896+1119894minus1

at state 119904119896+1119894

Models and therelated symbols shown in Figure 2 will be specified in theflowing section

4 Modified Asymmetric CellTransmission Model

A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions

41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582

1 1205822 1205823isin [0 1])

to reflect this new dynamics These three parameters aredefined as 120582

1= VIn119895V119895 1205822= 119908

In119895119908119895 and 120582

3= 119889

Inmax119895main 119889

max119895main

V119895and 119908

119895are the free flow speed and congestion wave speed

of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn

119895

119908In119895 and 119889Inmax

119895main are these three variables during the incident120588119895119862

and 120588In119895119862

are the critical densities for normal and incidentsituations 120588

119895119869and 120588In

119895119869are the jam densities for normal and

incident situations

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 3

the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair

119876119896+1(119904119896 119888119896) = 119876

119896(119904119896 119888119896) + 120572 [119877

119896(119904119896 119888119896)

+ 120574max119888119896+1

119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]

(2)

where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574

and 120572 can be regulated according to different problems

22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]

In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents

For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]

119876119896+1(119904119896 119888119896

1 119888

119896

119899) = 119876

119896(119904119896 119888119896

1 119888

119896

119899)

+ 120572 [119877119896(119904119896 119888119896

1 119888

119896

119899)

+ 120574 max119888119896+1

1119888119896+1119899

119876119896(119904119896+1 119888119896+1

1 119888

119896+1

119899)

minus 119876119896(119904119896 119888119896

1 119888

119896

119899)]

(3)

The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888

1 119888

119899executed by 119899 agents rather than to a single

action 119888

23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies

Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents

(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination

3 Dyna-119876 Based IndirectCoordination Strategy

Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture

Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced

31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)

In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process

Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy

4 Mathematical Problems in Engineering

Valuepolicy

ExperienceModel

Acting

Model learning

Planning

Direct RL

Figure 1 Dyna-119876 architecture

Valuepolicy

Model

and estimated traffic arrival and

Experience

Direct RL Planning

Acting

Model learning

Age

nt ar

chite

ctur

eA

gent

coor

dina

tion

i i + 1 i + 2

j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8

Agent i + 1Agent i Agent i + 2

middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot

dkjoff dkj+3offdkj+6off

dkj+2ondkj+5on dkj+8on

akj+2onakj+5on akj+8on

dkjmain dkj+8maindkj+2main

akj+7mainakj+1main

akj+8main

Q-valueQk+1i+1 (s

ki+1 c

ki c

ki+1)

dkj+3main dkj+3off akj+5main

(ii) Info from i cki qkion d

kjmain d

kjoff

(i) Real traffic arrival and departure rates

cki qkion

dkjmain dkjoff

cki+1 qki+1on

dkj+3main dkj+3off

departure rates of section i and i + 1

(i) ACTM

(ii) Action probability p(cki | ski+1)

Figure 2 System architecture

Dyna-119876 can learn faster than 119876-learning in many situations[30]

Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]

32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section

A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell

The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example

Mathematical Problems in Engineering 5

Incident extent

Critical section

(a)

Flow

Density0

dmaxjmain

dInmaxjmain wj

wInj

120588InjC 120588jC 120588In

jJ120588jJ

j

Inj

(b)

Figure 3 Fundamental diagram during the incident

experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess

To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894

Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)

If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by

119876119896+1

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) = 119876

119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)

+ 120572[

[

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) + 120574max

119888119896+1

119894

sum

119888119896+1

119894minus1

119901 (119888119896+1

119894minus1| 119904119896+1

119894)

sdot 119876119896

119894(119904119896+1

119894 119888119896+1

119894minus1 119888119896+1

119894) minus 119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)]

]

119901 (119888119896+1

119894minus1| 119904119896+1

119894) =

count (119904119896+1119894 119888119896+1

119894minus1)

sum119888119894minus1isin119862119894minus1

count (119904119896+1119894 119888119894minus1)

(4)

where 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is the immediate reward obtained by

agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896

119894are actions

executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) and

119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) are the 119876 values for agent 119894 at step 119896 + 1 and

step 119896 respectively 119862119894minus1

is the action set of agent 119894 minus 1count(119904119896+1

119894 119888119896+1

119894minus1) returns the number of visits for state-action

pair (119904119896+1119894 119888119896+1

119894minus1) Thus 119901(119888119896+1

119894minus1| 119904119896+1

119894) is the probability for

agent 119894 minus 1 selecting action 119888119896+1119894minus1

at state 119904119896+1119894

Models and therelated symbols shown in Figure 2 will be specified in theflowing section

4 Modified Asymmetric CellTransmission Model

A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions

41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582

1 1205822 1205823isin [0 1])

to reflect this new dynamics These three parameters aredefined as 120582

1= VIn119895V119895 1205822= 119908

In119895119908119895 and 120582

3= 119889

Inmax119895main 119889

max119895main

V119895and 119908

119895are the free flow speed and congestion wave speed

of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn

119895

119908In119895 and 119889Inmax

119895main are these three variables during the incident120588119895119862

and 120588In119895119862

are the critical densities for normal and incidentsituations 120588

119895119869and 120588In

119895119869are the jam densities for normal and

incident situations

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

4 Mathematical Problems in Engineering

Valuepolicy

ExperienceModel

Acting

Model learning

Planning

Direct RL

Figure 1 Dyna-119876 architecture

Valuepolicy

Model

and estimated traffic arrival and

Experience

Direct RL Planning

Acting

Model learning

Age

nt ar

chite

ctur

eA

gent

coor

dina

tion

i i + 1 i + 2

j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8

Agent i + 1Agent i Agent i + 2

middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot

dkjoff dkj+3offdkj+6off

dkj+2ondkj+5on dkj+8on

akj+2onakj+5on akj+8on

dkjmain dkj+8maindkj+2main

akj+7mainakj+1main

akj+8main

Q-valueQk+1i+1 (s

ki+1 c

ki c

ki+1)

dkj+3main dkj+3off akj+5main

(ii) Info from i cki qkion d

kjmain d

kjoff

(i) Real traffic arrival and departure rates

cki qkion

dkjmain dkjoff

cki+1 qki+1on

dkj+3main dkj+3off

departure rates of section i and i + 1

(i) ACTM

(ii) Action probability p(cki | ski+1)

Figure 2 System architecture

Dyna-119876 can learn faster than 119876-learning in many situations[30]

Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]

32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section

A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell

The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example

Mathematical Problems in Engineering 5

Incident extent

Critical section

(a)

Flow

Density0

dmaxjmain

dInmaxjmain wj

wInj

120588InjC 120588jC 120588In

jJ120588jJ

j

Inj

(b)

Figure 3 Fundamental diagram during the incident

experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess

To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894

Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)

If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by

119876119896+1

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) = 119876

119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)

+ 120572[

[

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) + 120574max

119888119896+1

119894

sum

119888119896+1

119894minus1

119901 (119888119896+1

119894minus1| 119904119896+1

119894)

sdot 119876119896

119894(119904119896+1

119894 119888119896+1

119894minus1 119888119896+1

119894) minus 119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)]

]

119901 (119888119896+1

119894minus1| 119904119896+1

119894) =

count (119904119896+1119894 119888119896+1

119894minus1)

sum119888119894minus1isin119862119894minus1

count (119904119896+1119894 119888119894minus1)

(4)

where 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is the immediate reward obtained by

agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896

119894are actions

executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) and

119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) are the 119876 values for agent 119894 at step 119896 + 1 and

step 119896 respectively 119862119894minus1

is the action set of agent 119894 minus 1count(119904119896+1

119894 119888119896+1

119894minus1) returns the number of visits for state-action

pair (119904119896+1119894 119888119896+1

119894minus1) Thus 119901(119888119896+1

119894minus1| 119904119896+1

119894) is the probability for

agent 119894 minus 1 selecting action 119888119896+1119894minus1

at state 119904119896+1119894

Models and therelated symbols shown in Figure 2 will be specified in theflowing section

4 Modified Asymmetric CellTransmission Model

A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions

41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582

1 1205822 1205823isin [0 1])

to reflect this new dynamics These three parameters aredefined as 120582

1= VIn119895V119895 1205822= 119908

In119895119908119895 and 120582

3= 119889

Inmax119895main 119889

max119895main

V119895and 119908

119895are the free flow speed and congestion wave speed

of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn

119895

119908In119895 and 119889Inmax

119895main are these three variables during the incident120588119895119862

and 120588In119895119862

are the critical densities for normal and incidentsituations 120588

119895119869and 120588In

119895119869are the jam densities for normal and

incident situations

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 5

Incident extent

Critical section

(a)

Flow

Density0

dmaxjmain

dInmaxjmain wj

wInj

120588InjC 120588jC 120588In

jJ120588jJ

j

Inj

(b)

Figure 3 Fundamental diagram during the incident

experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess

To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894

Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)

If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by

119876119896+1

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) = 119876

119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)

+ 120572[

[

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) + 120574max

119888119896+1

119894

sum

119888119896+1

119894minus1

119901 (119888119896+1

119894minus1| 119904119896+1

119894)

sdot 119876119896

119894(119904119896+1

119894 119888119896+1

119894minus1 119888119896+1

119894) minus 119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894)]

]

119901 (119888119896+1

119894minus1| 119904119896+1

119894) =

count (119904119896+1119894 119888119896+1

119894minus1)

sum119888119894minus1isin119862119894minus1

count (119904119896+1119894 119888119894minus1)

(4)

where 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is the immediate reward obtained by

agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896

119894are actions

executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) and

119876119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) are the 119876 values for agent 119894 at step 119896 + 1 and

step 119896 respectively 119862119894minus1

is the action set of agent 119894 minus 1count(119904119896+1

119894 119888119896+1

119894minus1) returns the number of visits for state-action

pair (119904119896+1119894 119888119896+1

119894minus1) Thus 119901(119888119896+1

119894minus1| 119904119896+1

119894) is the probability for

agent 119894 minus 1 selecting action 119888119896+1119894minus1

at state 119904119896+1119894

Models and therelated symbols shown in Figure 2 will be specified in theflowing section

4 Modified Asymmetric CellTransmission Model

A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions

41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582

1 1205822 1205823isin [0 1])

to reflect this new dynamics These three parameters aredefined as 120582

1= VIn119895V119895 1205822= 119908

In119895119908119895 and 120582

3= 119889

Inmax119895main 119889

max119895main

V119895and 119908

119895are the free flow speed and congestion wave speed

of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn

119895

119908In119895 and 119889Inmax

119895main are these three variables during the incident120588119895119862

and 120588In119895119862

are the critical densities for normal and incidentsituations 120588

119895119869and 120588In

119895119869are the jam densities for normal and

incident situations

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

6 Mathematical Problems in Engineering

42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from

the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations

Departure rates of the mainline and on-ramp

119889119896

119895main = min1205821sdotV119895

119897119895

sdot (119902119896

119895main + 120579119895 sdot 119889119896

119895on sdot Δ119905 minus 119889119896

119895off sdot Δ119905) 1205822 sdot119908119895minus1

119897119895

sdot (119902max119895minus1main minus 119902

119896

119895minus1main minus 120579119895 sdot 119889119896

119895minus1on sdot Δ119905) 1205823

sdot 119889119896max119895main

119889119896

119895on =

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905119888119896

119894

Δ119905

if 119895 is metered on-ramp cell

min

(119902119896

119895main + 119886119896

119895on sdot Δ119905)

Δ119905 120578119895sdot(119902

max119895main minus 119902

119896

119895main)

Δ119905

if 119895 is unmetered on-ramp cell

(5)

Conservation of the mainline and on-ramp

119902119896+1

119895main = 119902119896

119895main + Δ119905

sdot (119886119896

119895main + 119889119896

119895on minus 119889119896

119895main minus 119889119896

119895off)

119902119896+1

119895on = 119902119896

119895on + Δ119905 sdot (119886119896

119895on minus 119889119896

119895on)

(6)

where 119886119896119895main and 119889

119896

119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896

119895on and 119889119896119895on are the on-

ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is

the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896

119895off = 0) 119902119896

119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max

119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896

119895on and 119902max119895on denote the current (at step

119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896

119894is the metering rate for the on-ramp cell

of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow

allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending

parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study

For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896

119894main =

sum119869

119895=1119902119896

119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896

119894on = 119902119896

119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max

119894main =

sum119869

119895=1119902max119895main and 119902

max119894on = 119902

max119895on

43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as

119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by

119886119896119896+1

119894main = 119886119896119896+1

119895main =sum119873minus1

119899=0119886119896minus119899

119895main

119873 if 119895 is the boundary cell

119886119896119896+1

119894on = 119886119896119896+1

119895on =sum119873minus1

119899=0119886119896minus119899

119895on

119873 if 119895 is the on-ramp cell

119889119896119896+1

119895off =sum119873minus1

119899=0119889119896minus119899

119895off

119873 if 119895 is the off-ramp cell

(7)

where 119886119896119896+1119895main and 119886119896119896+1

119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1

119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894

5 Definition of RL Elements

Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 7

This section details these three elements and the relevantalgorithm

51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max

119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline

Therefore each mainline section can be represented by astate set 119878

119894main with 119899119894states Similarly on-ramp traffic is

represented by a state set 119878119894on with 119898

119894states according to

the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should

be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby

119878119894= 119878119894main times 119878119894on 119904

119896

119894isin 119878119894

(8)

which contains 119899119894sdot 119898119894states At each time step a state 119904119896

119894will

be selected from 119878119894as the environment state If motorway

section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby

119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904

119896

119894isin 119878119894

(9)

which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1

states

52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values

Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step

The action selection probability can be formally expressedas

119901 (119888119896

119894| 119904119896

119894)

=

1 minus 120576 if 119888119896119894= arg max119888119896

119894

(119876119896minus1(119904119896

119894 119888119896

119894))

120576 otherwise

(10)

53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process

TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation

TTS = Δ119905 sdot119870

sum

119896=0

(119902119896

119894main + 119902119896

119894on) (11)

In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870

119896=0(119902119896

119894main +119902119896

119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations

(1) Motorway Section 119894 Is the Critical Section Consider

119877119896

119894(119904119896

119894 119888119896

119894)

=

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus1 otherwise

(12)

where 119877119896119894(119904119896

119894 119888119896

119894) is the immediate reward for agent 119894 in state

119904119896

119894when executing action 119888119896

119894at control step 119896 119902max

119894main and 119902max119894on

are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896

119894(119904119896

119894 119888119896

119894) isin [minus1 0]

(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother

119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) =

minus119902119896

119894main + 119902119896

119894on

119902max119894main + 119902

max119894on

minus

10038161003816100381610038161003816119902119896

119894on minus 119902119896

119894minus1on10038161003816100381610038161003816

max (119902max119894on 119902

max119894minus1on)

if 119902119896119894main lt 119902

max119894main 119902

119896

119894on lt 119902max119894on

minus2 otherwise

(13)

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

8 Mathematical Problems in Engineering

For each agent 119894 and episode do

119871 larr997888 CEIL( IncidentDurationΔ119905

)

IF 119894 is the critical sectionInitialise 1198770

119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)

ELSEInitialise 1198770

119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)

For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896

119895main 119889119896

119895main 119889119896

119895off 119886119896

119895on 119889119896

119895on(ii) get state 119904119896

119894through (8) and (9)

(iii) get action 119888119896119894by 120576-greedy policy (10)

(iv) get 119902119896119895main 119902

119896

119895on through (6) and do 119902119896119894main larr sum

119869

119895=1119902119896

119895main 119902119896

119894on larr 119902119896

119895onIF 119894 is the critical sectionupdate 119877119896

119894(119904119896

119894 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894) through (2) and (12)

ELSE update 119877119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119876119896119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) 119901(119888119896119894minus1| 119904119896

119894) through (4) and (13)

IF 119904119896119894= 119904

initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1

119894main 119886119896119896+1

119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897

119894larr 119904119896

119894 119902119897119895main larr 119902

119896

119895main 119902119897

119895on larr 119902119896

119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897

119895main119889119897

119895on through (5)(ii) get the state 119904119897

119894

(iii) get 119902119897119895main 119902

119897

119895on and do 119902119897119894main larr sum

119869

119895=1119902119897

119895main 119902119897

119894on larr 119902119897

119895on(iv) get action 119888119897

119894by 120576-greedy policy

IF 119894 is the critical sectionupdate 119877119897

119894(119904119897

119894 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894)

ELSE update 119877119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119876119897119894(119904119897

119894 119888119897

119894minus1 119888119897

119894) 119901(119888119896119894minus1| 119904119896

119894)

IF (119897 = 119896 + 9) or (119904119897119894= 119904

initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor

EndForEndFor

Algorithm 1 Algorithm for Dyna-MARL

Compared to (12) a new term |119902119896

119894on minus 119902119896

119894minus1on|max(119902max

119894on 119902max119894minus1on) is added into (13) which is a penalty

for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896

119894(119904119896

119894 119888119896

119894minus1 119888119896

119894) is related to two control actions 119888119896

119894minus1and

119888119896

119894 max(sdot sdot) returns the maximum value of two given

parameters which is used for normalisation

54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1

An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance

6 Case Study and Results

One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows

61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 9

O

D

J25

J24

J23

J21A

J22

D1 (to M62)D2 (to M62)

D3 (to A579)

D4 (to A580)

D5 (to A58)

O3 (from A49)

O2 (from A580)

O1 (from A579)

Figure 4 Test motorway segment of M6

J21A J22 J23 J25J24

D O

Section 1 Section 2 Section 3

1 1 11 18 151515 13 1 1

13 16 07 17210743

Section boundaryCell boundary

c b a

Road section length (unit km)Flow direction

D1 D2 D3 D4 D5O1 O2 O3

Figure 5 Partitions of test segment

vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902

max2main = 2880 119902

max3main = 2880 119902

max1on = 108

119902max2on = 90 and 119902

max3on = 120

62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation

In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy

Section 3 Section 2 Section 1

On-ramp 3 On-ramp 2 On-ramp 1

0

200

400

600

800

1000

1200

15-m

inut

e tra

ffic c

ount

060000 120000 180000 000000000000Time of day

Figure 6 Real averaged traffic data

traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

10 Mathematical Problems in Engineering

Table 1 ODmatrix estimated

Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2

886 0 0 61 315 216 1477O1

824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146

Table 2 Parameters for ACTM

Parameter 119889maxmain V 119908 120579 120578 120582

112058221205823

Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06

is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data

63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively

The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident

Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578

64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows

(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident

In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion

which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively

(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))

For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances

(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO

2emission

(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems

As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO

2emission (shown in Figure 8(c)) both Isolated RL

and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL

Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators

(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 11

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

0

50

100

150

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(a)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(b)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(c)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(d)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(e)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(f)

Figure 7 Continued

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

12 Mathematical Problems in Engineering

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(g)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(h)

0

50

100

150

020

4060

80100

03

69

1215

Motorway length (km) Absolute tim

e (min)

Mot

orw

ay d

ensit

y (v

ehk

mla

ne)

20 40 60 80 100

Density (vehkmlane)

(i)

Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC

system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as

SD (119896) = radicsum119899

119894=1[119905119896

minus 119905119896

119894]2

119899

(14)

where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896

119894is the estimated total

travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896

Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can

be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 13

900

1200

1500

1800To

tal t

rave

l tim

e (h)

B CA

NCIsolated RLDyna-MARL

Scenario

(a)

7000

7500

8000

8500

9000

9500

10000

Tota

l thr

ough

put (

veh)

B CA

NCIsolated RLDyna-MARL

Scenario

(b)

NCIsolated RLDyna-MARL

14

15

16

17

18

19

2

Tota

l CO

2em

issio

n (k

g)

B CAScenario

times104

(c)

Totalthroughput

Totaltravel time emission

minus3

0

4

8

12

16Re

duct

ion

from

NC

()

B C A B C A B CA

Total CO2

NCIsolated RLDyna-MARL

Scenario

(d)

Figure 8 Comparison of general measures for different scenarios

7 Conclusions and Future Work

A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation

Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO

2emission but

this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy

much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively

Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

14 Mathematical Problems in Engineering

NCIsolated RLDyna-MARL

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000Time of day

(a)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(b)

0

2

4

6

8

10

Stan

dard

dev

iatio

n (h

)

073000 080000 083000070000

NCIsolated RLDyna-MARL

Time of day

(c)

Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Mathematical Problems in Engineering 15

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study

References

[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002

[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003

[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991

[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003

[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989

[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997

[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997

[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999

[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001

[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005

[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006

[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014

[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993

[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008

[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977

[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994

[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007

[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007

[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010

[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011

[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012

[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006

[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013

[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012

[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015

[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010

[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

16 Mathematical Problems in Engineering

[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008

[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014

[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992

[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006

[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002

[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013

[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994

[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013

[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000

[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010

[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk

[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984

[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005

[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Research Article Intelligent Ramp Control for Incident Response …downloads.hindawi.com/journals/mpe/2015/896943.pdf · 2019. 7. 31. · Intelligent Ramp Control for Incident Response

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of