Reinforcement learning in neuro fuzzy traffic signal control

Reinforcement Learning in Neuro Fuzzy Traffic Signal

Control

By:-Abhishek Vishnoi(112501)

NITTTR Chandhigarh

INTRODUCTIONA fuzzy traffic signal controller uses simple

“if–then” rules which involve linguistic concepts such as medium or long, presented as membership functions.

In neuro-fuzzy traffic signal control, a neural network adjusts the fuzzy controller by fine-tuning the form and location of the membership functions.

The learning algorithm of the neural network is reinforcement learning, which gives credit for successful system behaviour and punishes for poor behaviour;

Basics of Neuro-FuzzyA combination of a neural network and a

fuzzy system is called a neurofuzzy system. In neurofuzzy control, the parameters of the fuzzy controller are adjusted using a neural network.

Neurofuzzy systems utilize both the linguistic, human-like reasoning of fuzzy systems and the powerful computing ability of neural networks.

They can avoid some of the drawbacks of solely fuzzy or neural systems

Fuzzy systemFuzzy logic were brought to public attention by

Zadeh. Fuzzy sets provide a mathematical

interpretation for natural language terms.A fuzzy set is a set without a crisp, clearly

defined boundary.It can contain elements with a partial

membership functionFuzzy control uses a rule base, where the rules

are propositions of the form “if X is S, then Y is T”. Here X and Y are linguistic variables,

Membership function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1.

Each linguistic terms is associated with a fuzzy set defined by a corresponding membership function.

Fuzzy traffic signal control

the controller receives measurements of incoming traffic and chooses the length of the green signal accordingly.

Advantage of fuzzy control systems over traditional ones:-

Their ability to use expert knowledge in the form of fuzzy rule.

Small number of parameters needed.

Fuzzy traffic signal control

• The traffic simulation environment used in this work is a two-phase controlled intersection of two-lane streets.

In each approaching lane there are two traffic detectors, the first one before the stop line and the other at the stop line.

These detectors send input measurements of traffic to the fuzzy controller:

APP, number of approaching vehicles in the green direction and QUE, number of queuing vehicles in the red direction. Depending on the traffic situation, the green phase can be extended with one or several seconds.

Output of the fuzzy controller is EXT, green time extension (in seconds).

The linguistic values of APP are zero, a few, medium and many; QUE are a few, medium and too long;

EXT are zero, short, medium and long.

Fuzzy rule The rule base consists of five rule sets. The choice of

the rule set depends on how many green extensions have already been given.

The objective of the rules is to split the green time and find the right moment of green termination so that the delay of vehicles is minimized

After minimum green (5 seconds)if APP is zero, then EXT is zeroif APP is a few and if QUE is less than medium,

then EXT is shortif APP is more than a few, then EXT is mediumif APP is medium, then EXT is long

After the first extensionif APP is zero, then EXT is zeroif APP is a few and if QUE is less than medium, then EXT is shortif APP is medium, then EXT is mediumif APP is many, then EXT is long

After the second extensionif APP is zero, then EXT is zeroif APP is a few and if QUE is less than medium, then EXT is shortif APP is medium and if QUE is less than medium,

then EXT is mediumif APP is many and if QUE is less than medium, then EXT is long

After the third extensionif APP is zero, then EXT is zeroif QUE is too long, then EXT is zeroif APP is more than a few and if QUE is less than

medium, then EXT is shortif APP is medium and if QUE is less than medium,

then EXT is mediumif APP is many and if QUE is less than a few,

then EXT is long After the fourth extensionif APP is zero, then EXT is zeroif QUE is too long, then EXT is zeroif APP is more than a few and if QUE is a few,

then EXT is shortif APP is medium and if QUE is less than a few,

then EXT is mediumif APP is many and if QUE is less than a few,

then EXT is long

Reinforcement learningWhy is reinforcement learning needed?The parameters of the fuzzy controller could

be updated using the back propagation algorithm common in supervised learning in neural networks.

Backpropagation algorithm cannot be used; instead, a learning algorithm called reinforcement learning is used.

In reinforcement learning, the system evaluates whether the previous control action was good or not. If the action had good consequences, the tendency to produce that action is strengthened, that is, reinforced.

Structure of the neurofuzzy control systemThe evaluation network

gathers information about the decisions of the fuzzy controller and the delays of the vehicles.

This reinforcement information is used in fine-tuning the membership functions of the fuzzy controller, which is also presented as a neural network.

Thus there are actually two neural networks in the system: an evaluation network and a fuzzy controller network.

Evaluation network

The evaluation network evaluates the goodness of the actions of the fuzzy controller based on information it has gathered by observing the process.

The network fine-tunes the membership functions of the fuzzy controller by updating the parameters of the membership functions.

It is a feedforward, multilayer perceptron-type network.

The input variables of the network are APP and QUE, measurements of incoming traffic in green and red directions, respectively.

.

The hidden layer activation function is a sigmoidal function zj(xj)=1/(1+exp(−xj)),

where The size h of the hidden layer may vary, and there are no precise rules for determining how many cells it should contain. Increasing the size gives a more powerful and flexible network but requires a longer learning time. In our work the size of h=10 was found suitable

The network output v is a measure of the goodness of the state of the network, a prediction of future reinforcement.

v = b1 APP + b2 QUE +The gradient descent algorithm is used in the

learning phase of the evaluation network. If a positive reinforcement signal is

received, the network weights are rewarded by being changed in the direction which increases their contribution to the total sum.

If a negative signal is received, the weights are punished by being changed in the direction which decreases their contribution.

h

j 1

cjzj

Fuzzy controller network

Consider the first rule set, whose neural network presentation in figure.

Experimental results

Fig-Average vehicular delays (in seconds) before (dashed line) and after (solid line) the learning at traffic volumes of 300, 500 and 1000 vehicles per hour. The location of the first traffic detector is 50 m from the stop line.

Fig-Average vehicular delays (in seconds) before (dashed line) and after (solid line) the learning at traffic volumes of 300, 500 and 1000 vehicles per hour. The location of the first traffic detector is 100 m from the stop line.

Here dini and dnew are the average vehicular delays using the initial and the new membership functions, respectively, D=dini−dnew is the difference of individual observations

Volume (veh/h)

Det. (M) dini dnew D

500 50 9.99 9.42 0.57

1000 50 14.78 13.96 0.81

500 100 9.11 8.82 0.81

1000 100 15.18 14.52 0.66

Membership functions zero, a few, medium and many of APP before (dotted line) and after (solid line) the learning at a traffic volume of 500 vehicles per hour. The location of the first traffic detector is 50 m from the stop line. Horizontal axis: number of approaching vehicles. Vertical axis: value of membership function.

Fig Membership functions a few, medium and too long of QUE before (dotted line) and after (solid line) the learning at a traffic volume of 500 vehicles per hour. The location of the first traffic detector is 50 m from the stop line. Horizontal axis: number of queuing vehicles. Vertical axis: value of membership function.

Fig. membership function zero, short , medium and long of EXT before (dotted line) and after (solid line) the learning at a traffic volume of 500 vehicles per hour. The location of the first traffic detector is 50 m from the stop line. Horizontal axis: green signal extension in seconds. Vertical axis: value of membership function.

Education

Reinforcement learning in neuro fuzzy traffic signal control