Gravity model parameter calibration for large scale

Gravity model parameter calibration for large scale

strategic transport models.

Mark Pots

Company supervisors: Bastiaan Possel, Luuk Brederode and Nico Aardoom

University supervisor: Peter J.C. Dickinson

Graduation supervisor: Marc Uetz

Chair: Discrete Mathematics and Mathematical Programming (DMMP)Applied Mathematics

October 29, 2018

Abstract

This study considers the calibration of the lognormal cost function parameters within the simultaneous gravitymodel for large scale strategic transport models. The parameters are calibrated based on observed trip lengthdistributions and modal split fractions. First the full trip distribution used at Goudappel Coffeng that consistsof a stratification in trip purposes and sub purposes is described in detail. A new calibration approach is in-vestigated that proposes a triproportional fitting procedure over the current biproportional fitting procedure.The triproportional fitting procedure solves the gravity model equations while simultaneously fitting the tripdistribution on modal split. This simplifies the calibration eliminating the modal split constraint and its associ-ated lognormal parameters. Under the new approach the calibration algorithms’ running times are considerablyreduced.

The calibration problem for this new approach then is formulated as a bilevel optimization problem. In thisformulation the higher-level optimization task is the choice of parameters and the lower-level optimization taskis to maximize entropy subject to trip end and modal split constraints, given the choice of (behavioural) pa-rameters. This lower-level optimization task is solved by the triproportional fitting procedure. A BFGS QuasiNewton method was developed to solve the higher-level optimization. It is compared with the currently usedhillclimbing algorithm at Goudappel Coffeng. This new method converges more reliably to the local minimumalbeit much slower than the hillclimbing algorithm. Further two methods to compute the gradient required inthe BFGS method are considered. The first is a simple finite differences approach. The second method is ananalytical adjoint method. The second method is observed to be significantly faster for smaller to medium scalemodels (≤ 3300 zones), however occasionally suffers from a singular linear system.

1

Contents

1 Introduction 51.1 Trip numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 The four-step traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 The simultaneous gravity model 82.1 Doubly constrained simultaneous gravity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Trip-end constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Balancing trip-end values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 Generalized costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.4 The gravity equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.5 Resemblance with Newton’s law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Deterrence functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Exponential deterrence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Lognormal deterrence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Top-lognormal deterrence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Discrete deterrence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Gravity model derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Solving the gravity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 OD matrices and skim matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.2 Solution method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.3 Solution properties and convergence of the biproportional fitting procedure: . . . . . . . . 18

3 Calibration of the simultaneous gravity model 213.1 Full trip distribution model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Trip purposes and sub purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 User classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Calibration of deterrence function behavioral parameters . . . . . . . . . . . . . . . . . . . . . . . 243.3 Calibration criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.1 Empirical trip length distributions and modal splits . . . . . . . . . . . . . . . . . . . . . 253.3.2 Approximate confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.3 Normalization of trip distributions by binwidth . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Modelled trip length distributions and modal splits . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Original calibration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5.1 Change α’s step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5.2 Change β’s step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6 New calibration approach: modal splits as gravity model constraints . . . . . . . . . . . . . . . . 303.6.1 Triproportional fitting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.6.2 Discussion: Comparison with old approach . . . . . . . . . . . . . . . . . . . . . . . . . . 323.6.3 Modal split aggregateness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Mathematical problem formulation 334.1 Calibration objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Bilevel optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Gravity model convergence criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 Solution method performance criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2

5 Potential solution methods 375.1 Hillclimbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Gradient based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2.1 Approximating the gradient: finite differences . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.2 Calculating the gradient analytically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2.3 Adjoint gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Simultaneous perturbation stochastic approximation (SPSA) . . . . . . . . . . . . . . . . . . . . 425.4 Global optimization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4.1 Multi-start methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4.2 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.5 Comparison of potential solution methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 BFGS method 466.1 Line search approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.2 BFGS method implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2.1 Initial Hessian approximation choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2.2 Choice of step size rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2.3 Negativity constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Results 517.1 Calculating the gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.1.2 Computational effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.2 Calibration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.2.1 Calibration of the medium scale BBMA strategic traffic model . . . . . . . . . . . . . . . 557.2.2 Calibration of the large scale MRDH strategic traffic model . . . . . . . . . . . . . . . . . 61

8 Conclusions and recommendations 66

3

Notation

Globally used indices:

i a zone of origin

j a zone of destination

m a mode, e.g. m ∈ {car, bike, public transit}u a user class, e.g. u ∈ {co (car owners),nco (non car owners)}k a cost bin

Globally used Variables:

tijmu number of trips made from zone i to zone j using mode m by tripmakers in user class u

Tm OD matrix for mode m

T either denotes a full trip distribution i.e. a distribution of T overall the tijmu’s

or the OD matrix aggregated over all modes: T =∑m T

m

Pi(u) production value of zone i i.e. the observed number of trips de-parting from zone i (made by user class u)

Aj(u) attraction value of zone j i.e. the observed number of trips arrivingin zone j

Oi(u) production balancing factor of zone i

Dj(u) attraction balancing factor of zone j

Fmu(cijm) cost function to be calibrated, models the willingness to travelusing mode m at generalized costs c by a trip maker from userclass u

cijm (generalized) cost to travel from zone i to zone j using mode m

αmu (lognormal) cost function parameter for mode m and user class uto be estimated in the calibration or balancing factor for mode mand user class u

βmu (lognormal) cost function parameter for mode m and user class uto be estimated in the calibration

dmuk observed number of trips made by mode m by user class u of acost in bin k

dmuk modelled number of trips made by mode m by user class u of acost in bin k

MSmu the observed fraction of trips made using mode m by user class u

MSmu the modelled fraction of trips made using mode m by user class u

4

1 Introduction

This thesis is the final result of my graduation research at Goudappel Coffeng. The goal of this research wasto improve the automatic parameter calibration of the gravity model that exists within the transport planningsoftware package Omnitrans. The emphasis is on gravity models within large scale strategic traffic models.The gravity model for trip distribution is based upon Newton’s gravity model. It generates a trip distributionover the zonal grid of the strategic traffic model. The most important element that determines this distributionof trips is a cost function inside the gravity model. The calibration considers the parameters of this cost function.

This section gives an introduction to the framework and context in which this research is done. Section 1.1describes some basic trip properties which form the dimensions of the trip distribution resulting from the gravitymodel. Section 1.2 describes the four-step traffic model, which is the larger transport modelling context withinwhich the gravity model is an important component.

1.1 Trip numbers

The strategic traffic model we deal with considers a study area which is assumed to be partitioned into a setof n zones i.e. a zonal grid. In the trip distribution model we are concerned with the origin and destinationof trips as well as the mode by which the trip is made. We use i to denote a zone of origin from which a tripstarts and j to denote a zone at which a trip ends. We use the index m to denote a mode of transport by whicha trip is made. Examples of modes of transports include car, bike or public transit.

We have described three properties a trip has in the trip distribution model so far: a zone of origin, a zone ofdestination and the mode of transport by which it is made. Another trip property we are interested in is theperson who makes the trip. We can partition the population of trip makers into groups so that in a group allthe trip makers share one or more of the same properties. These groups we will call user classes and we use uto denote a particular user class. We can for example divide trip makers in the user classes of trip makers whoown a car and trip makers who do not own a car: u ∈ {co, nco}.

With these indices we can specify the number of trips with certain properties. By the trip number vari-able tijmu we denote the number of trips that are made from zone i to zone j by a trip maker from user class uusing m as a mode of transport for each pair of zones (i, j), mode m and user class u. Together all the tijmu’sform the trip distribution.

1.2 The four-step traffic model

The traffic and transportation modelling and forecasting done at Goudappel Coffeng, is usually based on theclassical four-step traffic forecasting model. This model describes in four steps the process of modelling themovements of traffic on a network. The modelling is done for a particular real world study area. The studyarea itself is modelled by a zonal grid and a network representing roads and traffic infrastructure connectingthe zones of the grid. The four steps are illustrated in figure 1 and are:

5

i

i j

i j

i j

j

tijm1

tijm2

tijm3

tijm

Trip generation:

Destination choice:

Mode choice:

Route choice:

Figure 1: Steps of the four-step traffic model.

One - Trip generation:The objective of the first step is to determine for each zone how many trips should have their origin (departures)and destination (arrivals) in that zone. The number of trips that have their origin in a particular zone is calledthe production of that zone while the number of trips that have their destination in that zone is called theattraction.

Two - Destination choice:The second step matches the production and attraction values to get a trip distribution specifying the numberof trips that go from each origin zone to each destination zone. Other than the production and attraction valueof an origin-destination pair the main factor determining this trip distribution is travel impedance meaningroughly speaking the time and distance it takes to travel the trip.

Three - Mode choice:The third step of mode choice, also called modal, split further refines the trip distribution resulting from thesecond step by determining for each trip which mode of transport is used.

Four - Route choice:Having specified a origin-destination pair and mode of transport for each trip, the last step of the model isconcerned with the route each trip takes in the network. Another name for this step commonly used is thetraffic assignment step since the step assigns traffic to the network. The traffic assignment is an important stepin the four-step model. As it calculates the load on the network which gives an estimation of the number oftravelers and travelling time on real traffic facilities such as roads and railways. These are important to parties

6

such as traffic engineers and municipalities.

Usually at Goudappel Coffeng, and so is the case in this thesis, steps two and three, destination choice andmode choice, are combined in one trip distribution model which does both steps simultaneously. For this reasonthe specific trip distribution model we will be considering, which is a gravity model, is called a simultaneousgravity model.

This thesis will not explain the first step of trip generation other than mentioning that both production andattraction values are generated in an exogenous model based on socio-economic data. Therefore we assume theproduction and attraction values from this step to be readily available data in the next step of trip distribu-tion. For a more thorough explanation on the trip generation and traffic assignment phase and a more throughtreatment of the four step model in general we refer to Ortur and Willumsen[5].

The goal of this research is to improve the calibration of cost function parameters of the simultaneous gravitymodel for large-scale (thousands of zones) strategic traffic models. Performance measures were established onwhich the developed methods were tested on. In particular we implemented gradient based methods and com-pared their performance with the original hillclimbing algorithm.

Structure of the remaining thesis

Chapter 2 introduces the reader to the gravity model of traffic by describing a basic doubly constrained si-multaneous gravity model. In chapter 3 the full trip distribution model is described after which the actual actof calibrating it is discussed. An alternative calibration approach is discussed and chosen instead of the currentlyused approach within Goudappel Coffeng. In chapter 4 the problem of calibrating the trip distribution modelby the new approach is cast into a mathematical optimization formulation. Further performance measuresare identified by which potential methods can be judged. Chapter 5 discusses the currently used hillclimbingalgorithm after which various potential alternative optimization algorithms are considered. A choice is madefor gradient based methods. The details of a particular gradient based method which is the BFGS method, aso called Quasi Newton method, are further discussed in chapter 6. Chapter 7 discusses the results obtainedfrom tests on these methods as well as results from comparing two different routines to calculate the gradient.Finally chapter 8 states conclusions and recommendations that follow from this research.

7

2 The simultaneous gravity model

The purpose of this chapter is to review the simultaneous gravity model within the four-step model. Thesimultaneous gravity model is the key building stone of the full trip distribution models that we review inchapter 3. Section 2.1 introduces a doubly constrained simultaneous gravity model for uniform trip makers.Section 2.2 discusses the choice of deterrence function that can be made inside the gravity model. The calibrationof this deterrence function is the main research subject. Section 2.3 presents a mathematical derivation of thegravity model. Section 2.4 describes the solution method for the gravity model which is the biproportionalfitting procedure and reviews some of its properties.

2.1 Doubly constrained simultaneous gravity model

This section describes the doubly constrained simultaneous gravity model. Wilson[15] founded the gravity modelapproach to trip distribution modelling. To simplify matters and to cater for the reader with no backgroundknowledge on gravity models we first introduce a simultaneous gravity model in which the population of tripmakers is assumed to be uniform. Therefore the index u for user classes is omitted for now and the modeldiscussed here computes the trip distribution tijm. The model computes a trip distribution based on trip-endvalues, generalized costs and a definition of the deterrence functions.

2.1.1 Trip-end constraints

The trip-ends consist of production and attraction values obtained from the trip generation step. Denote byPi the production value of zone i and denote by Ai its attraction value. As mentioned before: the productionvalue of a zone represents the number of trips that should originate in that zone while the attraction valuerepresents the number of trips that should have their destination in that zone. This is encapsulated by thetrip-end constraints: ∑

m,j

tijm = Pi for each i (1)

∑m,i

tijm = Aj for each j

The gravity model is called doubly constrained if both production and attraction constraints are requiredto be met. Singly constrained models only require one constraint type to be met. The trip-end data can alsobe omitted or simply unavailable in which case the model is called unconstrained. So the choice of gravitymodel in terms of it being singly (origin or destination based), doubly or unconstrained depends on the level ofknowledge in trip-ends.

2.1.2 Balancing trip-end values

Denote by T the total number of trips modelled i.e. the sum of all the tijm. It follows from the trip-endconstraints (1) that: ∑

i

Pi =∑i

∑m,j

tijm = T =∑j

∑m,i

tijm =∑j

Aj

However, the left- and right-hand side can be unequal in case the production model and attraction model(together forming the step of trip generation) are inconsistent1. This inconsistency can be corrected by a simplepreprocessing step called balancing. Balancing simply means to either scale the production values such that

1In practice this depends on the time period that is modelled for. For a time period of a full day the productions and attractionsshould be approximately equal. This is for example not the case for the period of rush hour.

8

their sum matches the sum of attraction values or to scale the attraction values to match the total productionsum. Which sum we hold as valid is determined by comparing the reliability of the models for productions andattractions. For example in case we trust the production values more we scale the attraction values towardsadjusted values A

′

j by:

A′

j =

(∑i Pi∑iAi

)·Aj

2.1.3 Generalized costs

Intuitively one would in general expect more people to travel between two locations given it is easier to makethe trip. This is where the second main input to the gravity model of generalized travelling costs come in toplay. Denote by cijm the generalized cost to travel from zone i to zone j using m as a mode of transport.Practically speaking the units of these costs can be anything from time, distance, energy or money hence theyare called generalized costs. The generalized costs can also be a linear combination of these numbers by applyingappropriate conversion coefficients. For example it is possible to express generalized costs in purely monetaryunits from travel time and travel distance in case we have knowledge of the value of time and fuel costs.

2.1.4 The gravity equation

The gravity equation can now be introduced by:

tijm = piajPiAjFm(cijm) (2)

Here pi ≥ 0 and aj ≥ 0 are called balancing factors which need to be determined so that the trip-endconstraints hold. The cost function Fm(.) is mode specific and acts on the generalized costs to model thewillingness to travel at a certain (generalized) cost. The cost functions are also called deterrence functions.

We can rewrite the gravity equation (2) in a simpler form by a change of variables: Oi = pi · Pi, Dj =aj ·Aj :

tijm = OiDjFm(cijm) (3)

We will actually use this equation as the gravity equation and by balancing factors we will mean theseOi’s and Dj ’s. However the form of equation (2) is useful to observe three assumptions that are built into thegravity model. Namely that the number of trips between a zone of origin i and a destination zone j made bymode m is proportional to the observed number of departures Pi from the origin zone i, the observed numberof arrivals Aj to the destination zone j and also is proportional to a cost function factor Fm(cijm) representingthe willingness to travel at the cost to travel from zone i to zone j by mode of transport m.

i

i j

i j

i j

j

tijm1

tijm2

tijm3

tijm

Trip generation:

Destination choice:

Mode choice:

Route choice:

Figure 2: The simultaneous gravity model determines both destination choice and mode choice.

9

2.1.5 Resemblance with Newton’s law

The gravity model of traffic is similar to Newton’s law of universal gravitation therefore lending its name fromit. Newton’s gravitational law states that the gravitational pull between two objects is proportional to the massof the first and second object and also is proportional to the inverse square of the distance between the objects,see figure 3. The gravity model of traffic usually uses a more general cost function however Fm(cijm) = c−2ijmhas been used in some models as a cost function. One way in which the analogy between Newton’s law and thegravity model of traffic breaks down is that in Newton’s law the calculated force works bidirectional i.e. theforce object one exerts on object two F2 is equal to the force object two exerts on object one F1 while we donot have tijm = tjim necessarily. Since the costs to travel on the way in cijm and on the way out cjim do nothave to be equal and neither do the production and attraction value pairs.

Figure 3: Newton’s law of universal gravitation.

2.2 Deterrence functions

Different choices of cost functions Fm(cijm) are available. The choice of the cost function has a large influenceon the trip distribution. The deterrence function values Fm(cijm) directly determine the relative distribution ofthe total trips over the different modes of transport i.e. the modal split between a zone-pair (i, j). To illustratethis, suppose we have a model with three modes m1, m2 and m3. Then by the gravity equation (3) we haveafter cancelling out common factors Oi and Dj :

tijmtijm1

+ tijm2+ tijm3

=Fm(cijm)

Fm1(cijm1) + Fm2(cijm2

) + Fm3(cijm3), for m ∈ {m1,m2,m3}

2.2.1 Exponential deterrence function

The most standard deterrence function is the exponential deterrence function:

Fm(cijm) = eβmcijm (4)

Here for each mode m the βm < 0 is a calibration parameter that can be used to influence the trip distributionoutcome. Picking a more negative βm has the effect of modelling shorter trips for mode m (more trips of smallergeneralized costs).

2.2.2 Lognormal deterrence function

In the case of a lognormal deterrence function two calibration parameters need to be specified: for each modem we have again a βm < 0 but now additionally we require for each mode an αm > 0:

Fm(cijm) = αm · eβmln2(cijm+1) (5)

The βm parameters are similar to those of the exponential function. The newly introduced parameters αm’sallow for scaling of the deterrence functions.

10

The only difference between the lognormal and the exponential deterrence function, other than this scalingproperty, is that the costs are transformed by cijm → ln2(cijm + 1). Figure 4 illustrates the shape of thelognormal distribution function for different parameter value choices.

1

2

Generalized traveling costs, c

Wil

lin

gnes

sto

trav

el,F

(c)

α = 1, β = −0.5α = 1, β = −2α = 2, β = −2

Figure 4: Lognormal deterrence functions for different values of parameters α and β.

0.5 1 4

0.5

1


Wil

lin

gnes

sto

trav

el,F

(c) α = 1, β = −2, γ = 4

α = 1, β = −2, γ = 0.5α = 0.5, β = −0.5, γ = 1

Figure 5: Top-lognormal deterrence functions for different values of parameters α, β and γ.

2.2.3 Top-lognormal deterrence function

A top-lognormal deterrence function is specified by an additional parameter γm > 0:

Fm(cijm) = αm · eβmln2(cijmγ )

The maximum willingness to travel for the top-lognormal function type then occurs at costs γm, as illustratedin figure 5. A top-lognormal distribution function is mostly applied in unimodal strategic traffic models to modelpeak attractiveness of a mode of transport at costs other than zero (the peak for lognormal functions). Formultimodal models, which this research focuses on, the simpler lognormal deterrence functions are sufficient in

11

ensuring peak attractiveness of a mode is achieved at a target cost interval due to the competitiveness, regulatedby the βm parameters, between different modes.

2.2.4 Discrete deterrence function

Another possibility for a deterrence function is the discrete deterrence function. A discrete cost function isspecified by nc cost bin intervals ck = [ck, ck+1], k = 1, . . . , nc and constant values Fmk that the function takeson over these intervals:

Fm(cijm) =

Fm1 , if cijm ∈ c1Fm2 , if cijm ∈ c2...

Fmnc , if cijm ∈ cnc

An advantage of using a discrete cost function is that it allows for more flexibility in terms of generatinga trip distribution. Too much flexibility can however lead to bad modelling practice. In the real world oneexpects the willingness to travel to always decrease for increasing travelling costs. As the deterrence functionmodels this willingness to travel one should restrict the choice to monotonically decreasing discrete functions2.Figure 6 shows two examples of such a cost function.

0 2 4 6 8


Wil

lin

gnes

sto

trav

el,F

(c)

Figure 6: Two discrete distribution functions defined for the same set of cost bins

2.3 Gravity model derivation

In this section we present the mathematical basis for the choice of a gravity model as the trip distributionmodel. Gravity models have the property that their solution maximizes a quantity called entropy. This wasfirst shown by Wilson[15]. Wilson derives the entropy maximizing property for a unimodal doubly constrainedgravity model. Here we apply his approach for the multimodal gravity model described in section 2.1 that hasan exponential cost function (4).

2For a unimodal traffic model one can, for the same reason a top-lognormal function is used, allow for unimodal discrete functionsi.e. a discrete function F (c) nondecreasing for c ≤ m and nonincreasing for c ≥ m for some m ∈ R .

12

The gravity model aims at finding a trip distribution that is the optimum of a certain optimization prob-lem. Namely it aims to find a trip distribution subject to the trip-end constraints (1) and an additional set ofcost constraints that maximizes a quantity called entropy. The entropy entropy(T ) of a trip distribution T isa measure for the probability of the trip distribution occurring and is defined as:

entropy(T ) =T !∏

i,j,m

tijm!

Where T is again the total number of trips modelled. To make the maximization easier we apply Stirling’sapproximation ln(N !) ≈ N · ln(N)−N , we instead maximize the following function:

e(T ) = −∑i,j,m

tijm · ln(tijm)

The additional set of cost constraints is:∑i,j

tijmcijm = Cm, ∀m

This constraint states that the total amount of capital or budget spent in the region of interest on tripsmade by mode m is a fixed amount Cm. However these constants are assumed to be unknown.

Summarizing, the optimization problem is:

maxT−∑i,j,m

tijm · ln(tijm) (6)

such that: ∑j,m

tijm = Pi, ∀i (7)

∑i,m

tijm = Aj , ∀j

∑i,j

tijmcijm = Cm, ∀m

We now apply the method of Lagrange multipliers, with multipliers λ(1)i , λ

(2)i and βm corresponding to these

constraints, to obtain necessary conditions for local optima:

L = e(T ) +∑i

λ(1)i (Pi −

∑j,m

tijm)

+∑j

λ(2)j (Aj −

∑i,m

tijm)

+∑m

βm(Cm −∑i,j

tijmcijm)

Now we look for stationary points of the Lagrangian. The partial derivatives of the Lagrangian with respectto the trip numbers satisfy:

13

∂L(T )

∂tijm= ln(tijm)− λ(1)i − λ

(2)j − βmcijm, ∀i, j,m

This partial derivative should vanish which happens exactly if:

tijm = e−λ(1)i −λ

(2)j −βmcijm , ∀i, j,m

To obtain the form of the gravity equation (3) we only have to apply two changes in variables:

Oi = e−λ(1)i , ∀i

Dj = e−λ(2)j , ∀j

Giving us:

tijm = OiDje−βmcijm , ∀i, j,m

Note that the partial derivatives of the Lagrangian with respect to the multipliers give us back the con-straints (7).

As the objective function (6) is a concave function and the equality constraints (7) are affine functions thenecessary conditions are also sufficient for a local optimum. In fact the concavity of the objective function alsoimplies that any local maximum must be a global maximum as well. It follows that a trip distribution satisfyingthe gravity equation (3) maximizes entropy.

2.4 Solving the gravity model

2.4.1 OD matrices and skim matrices

Before presenting the solution method to the gravity model we first introduce some convenient matrix notation.It is convenient to think of a trip distribution as a set of matrices, specifically origin-destination matrices (ODmatrices) or trip tables. For each mode m we define its OD matrix as the square matrix Tm ∈ Rnxn≥0 where tijmis its (i, j)th entry.

In the same way define generalized cost matrices Cm ∈ Rnxn≥0 for each mode m with cijm as the (i, j)th entry ofCm. Generalized costs matrices are also called skim matrices.

2.4.2 Solution method

The basic doubly constrained simultaneous gravity model described so far can be formulated as:

tijm = OiDjFm(cijm)

∑m,j

tijm = Pi for each i

∑m,i

tijm = Aj for each j

Oi, Di ≥ 0 for each i (8)

14

To find a trip distribution that satisfies the gravity model the Furness method is used. In mathematics,the Furness method is better known as IPF (iterative proportional fitting) or matrix raking. The Furnessmethod is described as a series of iterations in which in the odd iterations we scale for each zone the currentnumber of departures towards the target production value and in the even iterations we scale for each zonethe current number of arrivals towards the target attraction value. This fitting procedure is started from aninitial trip distribution where the trip numbers are equal to the product of the three proportionality factors:t0ijm = PiAjF

m(cijm) so the starting solution is Oi = Pi and Di = Ai for each zone i.

In terms of the OD matrices Tm the method can be described as in the odd iterations scaling the row sums ofthe aggregated matrix T =

∑m T

m towards the target production values Pi and in the even iterations scalingthe column sums of T towards the attraction values Aj . Note that the row sums always match the productionvalues after an odd iteration, and similarly the column sums precisely match the attraction values after an eveniteration. Algorithm 1 shows the pseudocode of this biproportional fitting procedure.

15

Algorithm 1 Biproportional fitting

1: for each zone i do2: Oi ← Pi3: Di ← Ai4: end for5: for each mode m do6: for each od pair (i, j) do7: tijm ← Oi ·Dj · Fm(cijm)8: end for9: end for

10: while not converged do11: for each zone i do

12: if∑j,m tijm > 0 then f ←

(Pi∑

j,m tijm

)else f ← 0

13: Oi ← f ·Oi14: for each zone j do15: for each mode m do16: tijm ← f · tijm17: end for18: end for19: end for20: for each zone j do

21: if∑i,m tijm > 0 then f ←

(Aj∑

i,m tijm

)else f ← 0

22: Dj ← f ·Dj

23: for each zone i do24: for each mode m do25: tijm ← f · tijm26: end for27: end for28: end for29: end while

To illustrate the concepts discussed so far we now introduce a simple gravity model instance3 and show howIPF solves it.

Example 2.1. We consider a small instance with two modes and three zones. The production and attractionvalues for the three zones are assumed to be given by:

P =

80

50

20

, A =[20 30 100

]

Given generalized costs defined by the following skim matrices:

3By gravity model instance we mean a combination of trip-end values, skim matrices and deterrence function choice.

16

Ccar =

5 1 2

1 8 2

1 4 2

, Cbike =

2 6 3

6 5 5

2 1 4

Assuming lognormal deterrence functions with parameters specified by:

(αcar, αbike) = (2, 1), and

(βcar, βbike) = (−0.5,−1)

Then the initial trip distribution can be calculated via t0ijm = PiFm(cijm)Aj (this corresponds with initial

balancing factors Oi = Pi and Di = Ai , see algorithm 1):

T 0car =

642.7 3775 8750.5

1572.9 268.4 5469.1

629.2 328.6 2187.6

, T 0bike =

64.5 1484.4 2392.9

618.5 12.0 1495.5

247.4 45.0 598.2

The first balancing step we scale the rows towards the target production values, this is done by multiplying

row factors f

f =

0.0047

0.0053

0.0050

to the rows of the OD matrices sucht that the resulting row sums match the production values:

T 1car =

3.005 17.650 40.914

8.334 1.422 28.979

3.118 1.629 10.841

, T 1bike =

0.302 6.941 11.188

3.277 0.064 7.924

1.226 0.223 2.964

The row factors of f are also multiplied with the production balancing factors to update them:

O =

0.3740

0.2649

0.0991

The next step consists of scaling the columns towards the target attraction values, we get column factors

f =[1.0383 1.0742 0.9727

]Which update the trip distribution such that the target attraction values are met:

17

T 2car =

3.120 18.960 39.796

8.654 1.5276 28.187

3.237 1.7493 10.544

, T 2bike =

0.3133 7.4554 10.882

3.4028 0.0683 7.7077

1.2729 0.2395 2.8834

The attraction balancing factors become:

D =[20.766 32.226 97.267

]After just two balancing steps the resulting distribution is already matching the trip-ends quite closely.

Continuing the balancing process, the distribution would get closer and closer towards the exact solution.

�

2.4.3 Solution properties and convergence of the biproportional fitting procedure:

Uniqueness

Note that given feasible balancing factors exist, they are not unique. Assuming (O,D) is a solution to thetrip end constraints in (8) we clearly have for any λ > 0 that ( 1

λ ·O, λ ·D) satisfies the trip-end constraints aswell. However given a solution exists the balancing factors are unique up to this constant factor i.e. we onlyhave λ > 0 as a degree of freedom. This is proved in theorem 2.1. In what follows the balancingfactors O, Dare represented either by a column and row vector or O, D represent diagonal matrices with diagonal entriese.g. Oii = Oi, depending on context.

Theorem 2.1. Uniqueness of trip distribution and balancingfactors.

Let M ∈ Rmxn≥0 be a matrix containing no zero rows or columns. Suppose we have production and attrac-

tion values P ∈ Rm>0, AT ∈ Rn>0. Suppose we have two sets of production and attraction balancing factors

O, O ∈ Rmxm>0 and D, D ∈ Rnxn>0 such that both the matrices T = OMD and T = OMD satisfy the produc-tions P and attractions A. Then the following two statements hold:

i): The balanced matrices are equal i.e. T = Tii): There exists a constant λ > 0 s.t. O = 1

λ ·O and D = λ ·D

Proof:Statement (i) follows directly from theorem 4 in Rothblum[12]. Then with (i) holding (ii) can be proved quiteeasily: Since OMD = OMD we must have OD = OD (here O, O represent columns and D, D rows). Thenthe jth columns of these must be equal i.e. Dj · O = Dj · O. Since the balancingfactors are positive we can

safely take λ =DjDj

> 0. Comparing the ith rows of OD = OD confirms that D = λ ·D holds as well for this

λ.

�To apply the theorem to the initial matrix in the biproportional fitting algorithm one should set M =∑m F

m(Cm). Theorem 2.1 allows the matrix M to contain zero entries however its rows and columns shouldnot vanish. The other restriction is that the production and attraction values should be strictly positive. Forthe gravity models considered these restrictions were usually violated. However the theorem is still applicableas the instances can be reduced by eliminating the zero rows and columns as well as the rows and columns ofzero production and attraction.

18

Existence of solution and convergence of biproportional fitting procedure

In example 2.1 it was shown the biproportional fitting procedure converged for the given instance. An im-portant question that arises is whether this is the case for every instance. The answer is no as the followingcounterexample taken from Pukelsheim[11] proves.

Example 2.2. Consider the following unimodal (of one mode) instance with two zones and the followingtrip-ends:

P =

4

2

, A =[2 4

]Suppose the costs and cost function lead to the initial OD matrix:

T 0 =

30 0

10 20

The fact that t12 = 0 could have as an explanation that zone 2 is unreachable from zone 1 which would have

been modelled by setting c12 = ∞. One can show that the biproportional fitting procedure does not convergein this case but ”oscillates between two distinct accumulation points”[11]:

limt=1,3,...

T t =

4 0

0 2

, and limt=2,4,...

T t =

2 0

0 4

�

Pukelsheim[11] also establishes necessary and sufficient conditions for convergence of the biproportionalfitting procedure. His analysis and theorem make use of the so called L1-error. The L1-error for the OD matrixin the kth iteration T k denoted by f(k) is calculated by:

f(k) =1

2

∑i

∣∣∣∣∣∣∑j

tkij − Pi

∣∣∣∣∣∣+1

2

∑j

∣∣∣∣∣∣∑i

tkij −Aj

∣∣∣∣∣∣A column j of the matrix T 0 is said to be connected to a row i of that matrix if t0ij > 0. For a subset of

rows I define J(I) to be the subset of columns connected to I in the initial OD matrix T 0 i.e. J(I) contains allcolumns containing a positive entry in some row of I. Pukelsheim’s main result can now be stated:

Theorem 2.2. Convergence of the biproportional fitting procedure (Pukelsheim[11], 2009)

For an initial matrix T 0 ∈ Rmxn≥0 with no zero rows or columns, production and attraction values P ∈ Rm>0,

AT ∈ Rn>0 we have for the limit of the L1-error during the biproportional fitting procedure:

limk→∞

f(k) = maxI⊆{1,...,m}

∑i∈I

Pi −∑j∈J(I)

Aj

(9)

Moreover, the biproportional fitting procedure converges if and only if this limit is zero.

19

Again we can simply reduce gravity model instances by eliminating zero rows and columns as well as therows and columns of zero production and attraction so that the theorem essentially applies to all instances ofinterest to us.

Some expressions in the maximum of (9) can be computed to possibly detect non-feasibility without actu-ally running the biproportional fitting procedure. However, for a general instance, theorem 2.2 is of little usein proving the biproportional fitting procedure will converge. As determining whether the maximum expres-sion in (9) is greater than zero has a running time of O(2m). Therefore deciding in general whether feasiblebalancing factors exist is likely best done by just running the biproportional fitting procedure. In this researchwe found that the fitting procedures converged for each gravity model instance encountered. This suggests thatnonconvergence is rare at least in our specific setting.

Purpose Sub purpose Mode-user class Productions Attractions

combinations

Work Home → Work: All modes P co, P nco A

Work → Home: All modes P Aco, Anco

Business Home → Business: All modes P co, P nco A

Business → Home: All modes P Aco, Anco

Business (home unrelated) - co: co modes P co Aco

Business (home unrelated) - nco: nco modes P nco Anco

Education Home → Education: All modes P co, P nco A

Education → Home: All modes P Aco, Anco

Stores Home → Stores: All modes P co, P nco A

Stores → Home: All modes P Aco, Anco

Other Home → Other - co: co modes P co Aco

Home → Other - nco: nco modes P nco Anco

Table 1: Summary of full trip distribution model for the strategic traffic models considered. Most multi-modal modelsused within Goudappel Coffeng have a similar structure.

20

3 Calibration of the simultaneous gravity model

This chapter gives a detailed explanation of the trip distribution model and the process of calibrating it whichis the focus of this thesis. Section 3.1 first describes the trip distribution model for all purposes and user classes.After that we turn to the main research problem of this thesis, namely the calibration of the gravity modelparameters. Section 3.2 discusses within which context the calibration is done and its purpose. Section 3.3discusses the criteria upon which the calibration process is based. Namely observed trip length distributionsand from these derived observed modal splits. Section 3.4 discusses how the modelled counterparts to theseobserved distributions are calculated. Then section 3.5 first explains the earlier approach to calibrating thelognormal distribution function at Goudappel Coffeng after which a new approach is presented and discussedin section 3.6.

3.1 Full trip distribution model

This subsection describes the full trip distribution model used within Goudappel Coffeng which includes userclasses and a stratification in trip purposes.

3.1.1 Trip purposes and sub purposes

In the real world trips are made for different purposes. For example trips made by people commuting canbe labeled with the Work purpose. A trip of a purpose here is further divided into sub purposes which oftenclassify the direction of the trip. In the literature Abdel-Aal[2] also uses purposes in the context of gravitymodel calibration. The first and second column of table 1 (page 20) show for the models we will be concernedwith the stratification in purposes and sub purposes. Each sub purpose is modelled by a separate gravity model.

3.1.2 User classes

User classes add another dimension to the trip distribution model and add another index u to the trip numbervariables: tijmu. Instead of estimating an OD matrix Tm for each mode m the gravity model now estimates anOD matrix Tmu for each mode-user class pair (m,u). The way in which user classes are exactly embedded intothe gravity model depends on the sub purpose data. We distinguish two cases: In case of symmetric productionsand attractions, by which we mean the number of production and attraction values are equal, the sub purpose issimply modelled by the gravity model already encountered. In case of asymmetric productions and attractionsthe gravity model is extended in an appropriate way. We now discuss these two cases in more detail and inrelation to table 1.

Asymmetric productions and attractions

We illustrate the two different gravity models with asymmetric productions and attractions by consideringthe Work purpose. As shown in table 1 the Work purpose has two sub purposes which represent the directions’Home →Work’ and ’Work → Home’. First we consider the ’Home →Work’ direction. For this direction morerefined data is given regarding its production values: they are distributed over car owners and non car owners,however the attraction side is aggregated. Trips with the ’Home → Work’ direction are modelled by a gravitymodel of the form:

tijmu = OiuDjFmu(cijm), ∀i, j,m, u

s.t. ∑j,m

tijmu = Piu, ∀i, u

21

∑i,m,u

tijmu = Aj , ∀j (10)

Trips of the reverse ’Work → Home’ direction are modelled by a separate gravity model of the form:

tijmu = OiDjuFmu(cijm), ∀i, j,m, u

s.t. ∑j,m,u

tijmu = Pi, ∀i

∑i,m

tijmu = Aju, ∀j, u (11)

The Education and Stores purpose as well as the two sub purposes taken together of the Business purposeare modelled the same way as the Work purpose just described. Each of these have a sub purpose for homebased trips, meaning the origins of the trips represent homes while the destinations represent places related tothe trips purpose e.g. offices, stores or schools, and each has a reversed direction sub purpose for which therole of origins and destinations are interchanged. The home based sub purpose has distinct production valuesfor both user classes P coi , Pncoi but single aggregated attraction values Aj . For the sub purpose of the reversedirection the reverse statement holds.

The solution procedure to these gravity models with asymmetric production and attraction values is describedin algorithm 2 and is essentially still the biproportional fitting procedure of algorithm 1. In case of user classspecific production values we just get double the rows for the aggregate matrix T if we define the OD matricesTm as stacking the matrices Tmu vertically i.e.

Tm =

Tm,co

Tm,nco

Similarly in case of user class specific attraction values the aggregate matrix T has double the columns with

OD matrices Tm = [Tm,co 999 Tm,nco]. So the problem algorithm 2 solves is still the doubly constrained gravity

model of chapter 2 and it is still the biproportional fitting algorithm. Therefore the solution and convergenceproperties discussed in section 2.4.3 also hold for these extensions to user classes.

22

Algorithm 2 Biproportional fitting with user class specific production valuesand aggregate attraction values for two user classes u ∈ {co, nco}.

1: for each zone i do2: Oiu ← Piu3: Di ← Ai4: end for5: for each mode-user class pair (m,u) do6: for each od pair (i, j) do7: tijmu ← Oiu ·Dj · Fmu(cijm)8: end for9: end for

10: while not converged do11: for each zone j do

12: if∑i,m,u tijmu > 0 then f ←

(Aj∑

i,m,u tijm

)else f ← 0

13: Dj ← f ·Dj

14: for each zone i do15: for each mode-user class pair (m,u) do16: tijmu ← f · tijmu17: end for18: end for19: end for20: for each user class u ∈ {co, nco} do21: for each zone i do

22: if∑j,m tijmu > 0 then f ←

(Piu∑

j,m tijmu

)else f ← 0

23: Oiu ← f ·Oiu24: for each zone j do25: for each mode m do26: tijmu ← f · tijmu27: end for28: end for29: end for30: end for31: end while

Here the production balancing steps are done last in the while loop. This is done because we are moreconfident in the accuracy of the production values (as the productions are known separately per user class4).Balancing the productions last has the result that the production constraints are met exactly. For the same rea-son in the balancing preprocessing step the attraction values should be balanced towards the total of productions.

Symmetric productions and attractions

For the Other purpose only home based trips are modelled. However both user class specific production valuesand user class specific attraction values are available for these trips and so two doubly constrained gravity

4Also because the production data for the sub purposes ’Home → Work/Business/..’ are more reliable in itself. In the otherdirection the attraction data for the sub purposes ’Work/Business/.. → Home’ are more reliable itself. In general the home-basedtrip-ends are the most reliable.

23

models are used. One for the car owner user class and one for the non car owner user class. These gravitymodels are therefore entirely equivalent to the gravity model discussed in chapter 2. For the Business purposewe have two extra sub purposes that model nonhome related trips i.e. business trips for which neither the originor destination represents a home. Each of these sub purposes models a user class separately similar to the Otherpurpose.

OD matrix aggregation

We have described how each sub purpose within a trip purpose is modelled by a separate gravity model.However in the rest of this thesis we are interested in the aggregate trip numbers for the trip purposes and sothe outcomes of the separate gravity models are aggregated again. Thus for each purpose the OD matrices Tmu

are equal to the sum of the OD matrices of its respective sub purposes.

3.2 Calibration of deterrence function behavioral parameters

The calibration of the trip distribution model deals with the choice of parameters βmu which appear in theexponential and lognormal cost functions inside the gravity model and additionally αmu parameters in case oflognormal cost functions. These parameters are called behavioral parameters since they specify behavior of tripmakers. The βmu’s model the propensity to make less costly trips for all mode-user class combinations whilethe αmu’s influence for each user class what percentage of its trip makers use a certain mode of transport i.e.the modal split of the user class. The behavior we try to encapsulate in these parameters is dependent on trippurpose. For example we generally expect trips to school or the shop to be shorter than the commute to work.Therefore the trip distribution model of each purpose is calibrated separately. However the gravity models ofthe sub purposes that constitute the trip distribution of a purpose share the same set of behavioral parameters.At Goudappel Coffeng the cost function used for the strategic traffic models is usually the lognormal one. Thelognormal deterrence function in the case of user classes is of the form:

Fmu(cijm) = αmu · eβmuln2(cijm+1) (12)

In this thesis we also assume this lognormal deterrence function so the calibration focuses on both the βmu’sand αmu’s parameters. Note that in (12) the user class index u is omitted in cijm as generalized costs are thesame for different trip makers.

Transferability of parameters

The production and attractions balancing factors Oi and Dj act upon a subset of cells (rows and columns)of the OD matrices while the behavioral parameters αmu and βmu act upon the whole matrices. The latterparameters can be called transferrable parameters. To explain why: suppose we have successfully calibrated thebehavioural parameters. The main use of these calibrated behavioural parameters then is to serve as an inputfor four-step models that consider the same study area but consider different trip-end values or skim matricesin the gravity model. This is done to for example simulate the effect of changes in infrastructure and or zonalcharacteristics on traffic movement patterns. Changes in infrastructure are modelled by adjusting the road net-work or cost skim matrices and changes in zonal characteristics are modelled by adjusting the socio-economicdata which changes the trip-end values of zones.

The production and attraction balancing factors are not transferable to other modelling contexts, as theyare naturally sensitive to their associated production and attraction values, while the behavioral parameters inthe calibration are mainly sensitive to empirical data outside the gravity model such as the observed trip lengthdistributions discussed in the next section.

24

3.3 Calibration criteria

The calibration of the behavioral parameters is based on both empirical trip length distributions and modalsplits. First we describe what these observed trip length distributions are and how observed modal splits arederived from them. Then we show how the count numbers on which the observed trip length distributions arebased can be used to construct confidence intervals.

3.3.1 Empirical trip length distributions and modal splits

For each of the traveling purposes we have for each mode-user class combination (m,u) an empirical trip length

distribution dmu available. To obtain such distributions surveys are done for the study area in which respondentswere asked to keep a travel diary, for a randomly selected week of the year, in which they keep track of all thetrips they made that week. From this data, the number of trips per purpose, mode, user class and distance classwas derived and scaled to account for the sampling ratio. In figure 7 we see, in blue, such a resulting distribution.

From the observed trip length distribution we can, given a user class u, derive for a mode m its observedmodal split percentage MSmu within that user class by:

Figure 7: An observed (blue) and modelled (red) trip length distribution for the purpose ’Work’, mode car and user classcar owners.

MSmu =

∑k dmuk∑m,k dmuk

Both modelled trip length distributions and modelled modal splits are compared to their observed counter-parts. In section 3.4 we describe how modelled trip length distributions and modelled modal splits are computedfrom the OD matrices.

3.3.2 Approximate confidence intervals

This section describes how approximate confidence intervals can be constructed for the observed trip lengthdistributions in case count numbers for these are available.

25

Let N be the total number of counted trips in a survey for some mode-user class combination (m,u). Of

these N counts denote by ki the number of trips with a length in the ith distance bin. Denote by pi the true

fraction of trips made with a length in the ith distance bin. Then for each bin k the fraction pi = kiN is a point

estimate of pi. We now provide a measure of the accuracy of these point estimates in the form of approximateconfidence intervals for the pk. For more information regarding the construction of binomial confidence intervalswe refer to Wallis[14].

From the theory of confidence intervals it follows that an approximate 100(1− α)% - confidence interval for piis:

pi ∈

[pi − zα/2

√pi(1− pi)

N, pi + zα/2

√pi(1− pi)

N

]Where α is the desired significance level and zα/2 is the (1 - α/2)-percentile of the standard normal distri-

bution. In case ki = 0 the so called rule of three from statistics can be used which assigns the interval [0, 3N ] to

pi. The confidence intervals can be interpreted in the following way: For a significance level of α we can expectthe constructed intervals to contain the true fraction of trips pi approximately 100(1 − α)% of the time. Therelative error ri can then be calculated by:

ri =zα/2

√pi(1−pi)

N

pi= zα/2

√1

ki− 1

N≈zα/2√ki

, for large N (13)

Thus the width of the confidence intervals is inversely proportional to the square root of the number ofobservations. The confidence intervals can be plotted as error bars around the observed trip length distribu-tions and form a visual guideline when inspecting the fit of the model to the observed distributions. Anotheruse of these confidence intervals is to incorporate them directly in the objective function by defining weightsfor the objective function for each distance bin that are inversely proportional to the width of the correspond-ing confidence interval. So in fact these weights are proportional to the square root of the number of observations.

Overestimation of counts

In general there will be a dependency between the trips a survey respondent makes in the same day. Themost important dependency is the one resulting from tours i.e. back-and-forth displacements. In the worst caseall observations are tours and then the number of observations is overestimated by a factor of 2, leading to anoverestimation in confidence by a factor of

√2. However the extent to which this is a problem depends upon

the time period for which the strategic traffic model is used. For a complete day many tours can be expectedto be in the data while one would expect less tours for survey data specific to the morning commute.

3.3.3 Normalization of trip distributions by binwidth

The bins on which the trip distributions are defined are of varying widths. This leads to biased trip lengthdistributions as wider bins are more likely to have more trip counts because a given trip is more likely to fallinto a wider bin. To interpret the trip distributions more fairly we can normalize them by bin width. In figure8 an observed and modelled normalized trip distribution are shown together with confidence bounds, in dottedmarker style, as constructed in section 3.3.2. Here the normalized trip numbers are plotted against the binmidpoints to give a better sense of how many trips are made of a certain length.

26

Figure 8: Bin width normalized observed (blue) and modelled (red) trip length distribution with confidence bounds (dotted)

3.4 Modelled trip length distributions and modal splits

Since in reality the study area is not a closed system in terms of traffic displacements the strategic traffic modelalso considers zones representing the area surrounding the study-area instead of only those making up the studyarea. To illustrate this figure 9 shows the zonal grid of the study area and surrounding area for the The Haguestrategic traffic city model or MRDH model. To save computational effort, generally the size of zones outsidethe study area increases as the distance to the study area increases.

(a) The Hague city region inside the Netherlands (b) Subdivision of the region into zones

Figure 9

27

The classification of zones in study-area and surrounding-area zones also brings about a classification intrips. Namely we distinguish study-area related trips which are trips that either start or end in a study areazone and non study-area related or external trips that both start and end in a zone outside the study area.We need to go one step further by partitioning the study-area related trips into internal, ingoing- and outgoingtrips for which we use figure 10 as a definition of these.

The survey counts only include observations of study-area related trips. However these are not counted equally.The number of times a study-area related trip is counted, is equal to the number of times its zone of departure orarrival is a study area zone. So internal trips, in- and outgoing trips, and external trips are counted respectivelytwice, once and never in the construction of the observed trip length distributions. Now for the purpose ofhaving a fair calibration criterium the modelled trip length distributions (resulting from the gravity models)are computed using the same weights.

Let sijm denote the length of a trip from zone i to zone j made by mode m and the k’s denote length bins(the same bins as those used for the empirical trip length distributions). The zone-to-zone distances logicallydepend on mode and not on user class. For easier notation in what follows let Smk denote the set of zone-pairssuch that the distance between them is in bin k, so Smk = {(i, j) | sijm ∈ sk}. Then the modelled trip lengthdistribution dmuk is given by:

dmuk = 2 ·∑

(i,j)∈Internal∩Smk

tijmu +∑

(i,j)∈Ingoing∩Smk

tijmu +∑

(i,j)∈Outgoing∩Smk

tijmu

Figure 10: Trips related to the study area: ingoing, outgoing and internal trips versus nonrelated external trips

The modelled modal splits can then be calculated from these trip length distributions by:

MSmu =

∑k dmuk∑m,k dmuk

(14)

Here we should mention that the trip numbers tijmu actually denote totals of some trip purpose i.e. theyare trip numbers aggregated over the sub purposes of that purpose.

28

Fitting on relative distributions

For the purpose of calibrating the βmu’s we will be interested in approaching the observed relative frequencytrip length distributions. The reason for this is that the total number of trips of the observed and modelledtrip length distributions are otherwise not necessarily consistent. As during the calibration the proportions ofinternal, in- and outgoing and external trips are dependent on the current choice of the calibration parametersand these proportions impact the total numbers of trips in the modelled trip length distributions. This normal-ization step makes it so we compare the shapes of the observed and modelled trip length distributions.

Feedback loop

It is perhaps interesting to mention now that usually there is an iterative feedback loop between the trip distri-bution model and the traffic assignment phase in the four-step modelling process. We can imagine costs cijm andtrip lengths sijm to be arising from the traffic assignment phase as congested traffic routes between zone-pairswould increase the travelling time and possibly eliminate routes. By iterating between the trip distributionand traffic assignment steps the impacts of congestion can be taken into account. This feedback mechanism ishowever not considered in the calibration as this would be too complicated and too computationally expensiveso the calibration is done under static costs and trip lengths.

3.5 Original calibration approach

We now describe the approach that has been used in practice until now at Goudappel Coffeng to automaticallycalibrate both the α’s and β’s parameters of the lognormal distribution function (12). Fransen[7] devised thisapproach illustrated in figure 11. It starts with initial parameters α0

mu’s and β0mu. The initial α0

mu’s arefirst adjusted in an initial change α’s step. Then the algorithm iterates with each iteration consisting of twosteps: the change β’s step and the change α’s step, respectively. The algorithm stops iterating if either a presetmaximum number of iterations is reached or the objective function is smaller than a predefined goal value. Nextwe describe these two steps.

Figure 11: Flowchart of the original automatic calibration approach, taken from Fransen[7]

29

3.5.1 Change α’s step

The goal of this step is to update the α’s such that the observed modal split is approximately realized. Fromthe gravity equations in (10) and (11) and the lognormal cost function (12) we see that the α’s have a linearinfluence on the number of trips for each mode. Therefore for each mode-user class pair (m,u) the current αmuis updated to the new α∗mu by multiplying by a scale factor:

α∗mu =MSmuMSmu

αmu

Where MSmu and MSmu are again the modelled and observed modal split for mode-user class pair (m,u),respectively.

Then a new modal split MS∗mu is computed after running the gravity model with the new α∗’s. This pro-

cess is iterated so that new α∗∗’s are determined from the older α∗’s and factors MSmuMS∗mu

until over all the

modes the average relative error between the observed and modelled modal split is smaller than a certain presetpercentage.

3.5.2 Change β’s step

A hillclimbing algorithm is used in the old approach to update the β’s. This original hillclimbing algorithm isdescribed in Fransen[7] and also in section 5.1. Each time a new set of β’s are determined the determinationof the α’s (or change α’s step) that follows has to be done multiple times until the resulting modal split issufficiently close to the observed modal split. This means that for a single Change β’s step multiple gravitymodel runs need to be done (hence the double arrow between the ’Determine α’ and ’Run Gravity Model’ blocksin figure 11).

3.6 New calibration approach: modal splits as gravity model constraints

3.6.1 Triproportional fitting procedure

For the following approach we discuss credit is due to Brethouwer[3] who suggested the idea to incorporate themodal split constraints directly into the gravity model when calibrating a lognormal deterrence function. Asan example the gravity model for the ’Home → Work’ direction in this new approach is of the form:

tijmu = OiuDjFmu(cijm), ∀i, j,m, u

s.t. ∑j,m

tijmu = Piu, ∀i, u

∑i,m,u

tijmu = Aj , ∀j

MSmu = MSmu, ∀m,u

The solution procedure to this gravity model is then to apply a triproportional fitting procedure whichincludes an extra balancing step for achieving the modal split constraints. This is described in algorithm 3.

30

Algorithm 3 Triproportional fitting procedure: balancing respectively towards target attraction, production(user class specific, for two user classes u ∈ {co, nco}) and modal split values.

1: for each zone i do2: Oiu ← Piu3: Di ← Ai4: end for5: for each mode-user class pair (m,u) do

6: αmu ← MSmu7: end for8: for each mode-user class pair (m,u) do9: for each od pair (i, j) do

10: tijmu ← αmu ·Oiu ·Dj · Fmu(cijm)11: end for12: end for13: while not converged do14: for each zone j do

15: if∑i,m,u tijmu > 0 then f ←

(Aj∑

i,m,u tijm

)else f ← 0

16: Dj ← f ·Dj

17: for each zone i do18: for each mode-user class pair (m,u) do19: tijmu ← f · tijmu20: end for21: end for22: end for23: for each user class u ∈ {co, nco} do24: for each zone i do

25: if∑j,m tijmu > 0 then f ←

(Piu∑

j,m tijmu

)else f ← 0

26: Oiu ← f ·Oiu27: for each zone j do28: for each mode m do29: tijmu ← f · tijmu30: end for31: end for32: end for33: end for34: for each mode-user class pair (m,u) do35: Calculate MSmu using equation (14).

36: if MSmu > 0 then f ←(MSmuMSmu

)else f ← 0

37: αmu ← f · αmu38: for each od pair (i, j) do39: tijmu = f · tijmu40: end for41: end for42: end while

31

The advantage of this new approach is that it requires only a method to calibrate the βmu’s parameter set,as the αmu’s are now a function of the βmu’s. In the original calibration approach αmu’s are arrived at onlyafter multiple change α’s steps in between the change β’s steps. The new calibration approach only requiresabout 20% of the running time of the old calibration approach and achieves the target modal splits exactly.

3.6.2 Discussion: Comparison with old approach

Convergence of the triproportional fitting procedure is like the biproportional fitting procedure not guaranteedbut in this research we observed convergence for each of the encountered problem instances. It was also observedthat the triproportional fitting procedure converges to the same trip distribution as the biproportional fittingprocedure i.e. algorithm 2 for which the lognormal parameters are initially set to the modal split balancingfactors obtained in the triproportional fitting procedure. This is also predicted by theorem 2.1. For example,in case of user specific production values, set M in the theorem according to:

M =∑m

Fm,co(Cm,co, αm,co)

Fm,nco(Cm,nco, αm,nco)

Assuming both procedures converged i.e. balancing factors Obi,Dbi and Otri,Dtri were found such that theconverged aggregated OD matrices Tbi = ObiMDbi and Ttri = OtriMDtri satisfy the productions and attrac-tions. Then by theorem 2.1 these aggregated OD matrices are equal i.e. Tbi = Ttri and the balancing factorsObi,Dbi and Otri,Dtri are equal up to constant scaling. Then clearly the disaggregate OD matrices per modeand user class must be the same too.

The importance of these distributions being equal is that the nature of the biproportional gravity model isstill intact with the extra balancing step. The modal split balancing step gives a trip distribution satisfyingmodal split while still maximizing entropy subject to the trip end constraints for the chosen βmu and (resulting)αmu. The αmu’s resulting from the triproportional procedure can therefore still be interpreted as behavioralparameters of the lognormal deterrence function and are still transferable between different models.

3.6.3 Modal split aggregateness

The new calibration approach introduces the modal split constraints directly into the gravity model of a subpurpose. However, in our trip distribution framework the modal split is looked at aggregated over all subpurposes within a purpose. This is why in the old calibration approach the αmu’s for the different sub purposeswithin a purpose are the same. Fortunately for the models we considered, the disaggregated observed modalsplits per sub purpose were close to the observed modal splits of the overarching purpose. Therefore we keptto the approach as described. But we believe that the triproportional approach could be adjusted to work for asingle aggregate modal split constraint per purpose by iterating between the following two steps: 1): Balancingthe OD matrices on trip end constraints (productions and attractions) for each sub purpose seperately and2): Jointly balancing (aggregated over all sub purposes) the OD matrices on the modal split constraint. Onedrawback to this is the larger memory requirement as the OD matrices of all sub purposes would need to be inmemory at the same time.

32

4 Mathematical problem formulation

In this section we give the mathematical problem formulation that we will focus on solving. Specifically weformulate the new triproportional calibration approach discussed in section 3.6 as a bilevel optimization prob-lem. Section 4.1 discusses some potential choices for the calibration objective function after which in section4.2 the bilevel problem is formulated. Section 4.3 gives the convergence criterion used for the gravity modelsand explains its importance. Finally section 4.4 gives a list of performance criteria to be taken in mind in thesearch for potential solution methods.

Some notation:

We first introduce some vector notation for some of the already encountered variables of the trip distributionmodel. Denote by β the full set of parameters βmu of the trip purpose being calibrated at hand. Rememberthat within a purpose the parameter set of each sub purpose is this set (or a proper subset of this set as logicallya user class specific sub purpose inherits only the parameters of its user class). Denote by O, D, α the full setof production, attraction and modal split balancing factors that relate to the purpose. So each of these sets arethe union of the subsets of balancing factors resulting from the IPF procedures for the different sub purposegravity models. Similarly here by T we denote a full purpose trip distribution i.e. the set of all (disaggregated)trip numbers tijmu related to the sub purposes of a specific purpose.

4.1 Calibration objective function

In section 3.4 we showed how to derive the modelled trip length distribution from a trip distribution. In thecalibration one wants to choose parameters βmu such that the modelled trip length distributions in some senseapproach the observed trip length distributions. An objective function that is useful to measure the similaritybetween two distributions is to take the sum of squared differences:

F (β) =∑m,u,k

(d relmuk − d relmuk)2 (15)

Here d relmuk and d relmuk represent respectively modelled and observed relative trip length distributions bothexpressed in percentages. They can be calculated by:

d relmuk = 100 ·

(dmuk∑k′ dmuk′

)(%)

d relmuk = 100 ·

(dmuk∑k′ dmuk′

)(%)

The relative numbers are preferred in (15) since the total number of modelled trips can vary during the

calibration as reasoned in section 3.4. The trip length distribution numbers dmuk and dmuk can either denoteabsolute trip numbers or these normalized by distance bin width as described in section 3.3.3. Note that theobjective function in (15) is written as a function of the parameter set β because the modelled trip lengthdistributions are a function of this parameter set. This fact will be made more clear in the next section.If one is more interested in mileage than trip numbers one can apply a weighting resulting in the followingobjective function:

F (β) =∑m,u,k

(mk(d relmuk − d relmuk))2

33

Here the bin midpoints mk serve as weights with the goal of better fitting on distance bins of higher length.A reason it can be desirable to better fit in this way on the total amount of mobility measured in mileage isdue to the trip assignment phase of the four step model. OD pairs further apart have longer routes in the roadnetwork which naturally then consist of more links compared to routes of shorter distanced OD pairs. Theresulting link flows of the trip assignment phase is compared based on a subset of links that have observedtraffic counts attached to them. Since the trip numbers of the higher distance bins impact more links, and solikely more links with counts attached, a better fit in the trip distribution phase likely translates to a bettergeneral fit in the trip assignment phase.

Another alternative objective function is:

F (β) =∑m,u

(1− r2mu)

with rmu being the Pearson correlation coefficient between the (absolute) observed and modelled trip lengthdistributions. This objective function was used for the original calibration approach before this research. Thecorrelation coefficient is invariant to scaling of the series and therefore automatically compares relative distribu-tions. However we have chosen to use the objective function of equation (15) as it is easier to find the derivativeof it as well as the measure itself is easier to explain.

4.2 Bilevel optimization problem

This subsection gives an mathematical optimization problem formulation for the problem of calibrating a tripdistribution model of a trip purpose. The optimization problem is of a special kind. Namely it is a bileveloptimization problem where the problem of maximizing entropy or solving the gravity model is embeddedwithin the calibration of the gravity model parameters β. We refer to solving the gravity model as the inneroptimization problem and the problem of selecting the optimal β during the calibration as the outer optimizationproblem.The bilevel optimization problem is then formulated as:

minβ<0,(O,D,α)≥0

F (β,O,D,α) s.t. : (16)

(O,D,α) ∈ arg max(O,D,α)≥0

{entropy(T ) | T = T (β, O, D, α) satisfies the relevant trip end and modal split constraints}

The solution method for the inner optimization problem is the triproportional fitting procedure of algorithm3. Note that the inner optimization problem actually consists of solving multiple gravity models for each subpurpose in series. Assuming convergence of the IPF procedure(s) we can regard the set of balancing factorsas a function of the choice of the β parameter set. To emphasize this we write O = O(β), D = D(β) andα = α(β). The objective value is calculated from the trip distribution which is a function of the balancingfactors and the parameter set. Therefore we can write F (β,O(β),D(β),α(β)) = F (β) i.e. the objectivefunction can essentially be regarded as a function of the parameter set β only. The flowchart in figure 12 helpsto illustrate this. However, due to the iterative nature of the fitting procedure it is especially difficult to givesome sort of closed-form expressions for F (β), O(β), D(β) or α(β). Note that the outer optimization problemof calibration is constrained as βmu < 0 is required.

34

IPF procedure Calculate objectivefunction

objective function value, F(β)

modelled trip length distributions,

dβ

observed trip length distributions,

d balancing factors O,D,α

Trip-end constraints sufficiently satisfied ?

Figure 12: Flowchart of the bilevel optimization problem

4.3 Gravity model convergence criterion

Assuming the IPF-procedures converges, its solution is still inexact, meaning generally there will be some residualin the trip end constraints after any finite number of IPF iterations. Therefore a gravity model convergencecriterion (the diamond shaped box in figure 12) needs to be established. The choice of a threshold used in theconvergence criterion will have an impact on the bilevel optimization problem as these determine the number ofIPF iterations required which naturally impacts the balancing factors and objective function. It does not makemuch sense to base the convergence criterion on absolute residual errors as different strategic traffic models giverise to different orders of absolute trip numbers. Therefore the convergence criterion we will be using is basedon relative residuals. For example in the case of the gravity model for the ’Home → Work’ sub purpose themaximum relative residuals of production and attraction constraints respectively Rrelprod and Rrelattr as defined by:

Rrelprod = maxi,u

(∑j,m tijmu − Piu)

Piu(17)

Rrelattr = maxj

(∑i,m,u tijmu −Aj)

Aj

The convergence criterion is then to enforce that both of these maximum relative residuals should be smallerthan a preset percentage. Note that the modal split constraints do not need to be included in the convergencecriterion as they are satisfied exactly as the IPF procedure terminates on a modal split balancing step, seealgorithm 3. Indeed a different order could be chosen putting e.g. the production and attraction balancingsteps last which would imply the production and attraction constraints are deemed more important than themodal split constraints. However it was observed in this research that the order favoring modal split constraintswas only behind at most 1-2 iterations in terms of convergence in Rrelprod and Rrelattr, suggesting the order wechoose only matters slightly.

4.4 Solution method performance criteria

For a given optimization problem it is often the case that there are numerous potential approaches and methodstowards solving it. Therefore it is useful to identify and formulate performance criteria by which the larger setof options available can be narrowed down and the selected few methods can be tested on. Below we explainthe five performance criteria that are deemed most important.

35

i): Reliability in finding a quality solution:

By a quality solution we mean first and foremost its objective function value. Ideally the method shouldconverge to a global minimum. However the task of reliably finding a global minimum to an optimizationproblem can prove very difficult. The optimization problem formulated in this chapter is generally not convex.Therefore it can be the case that the solution space has multiple local minima of which some of them can besuboptimal. So given that a method finds a local minimum it is not necessarily a globally optimal solution.Ensuring the solution found cannot be further improved at least locally i.e. finding a local minimum seemslike a reasonable aim given the difficulty of global optimization. Despite this difficulty we will at least paysome attention to exploring global optimization techniques in the next chapter. It can also be meaningful tojudge the quality of a solution in terms other than its objective function value. Two distinct methods couldfind solutions with a seemingly significant difference in objective values. Plots of the solutions modelled triplength distribution versus the observed trip length distributions with confidence bounds as defined in section3.3.2 could reveal the visual difference is actually meaningless to an end user (consultant) at Goudappel Coffeng.

ii): Number of solution method parameters:

The number of solution method parameters should be minimal. A minimal number of solution method pa-rameters makes use of the method by the end user easier as it requires less knowledge on how to pick the rightvalues. Here by solution method parameters we do not mean empirical parameters that could perhaps still be es-timated from data, but heuristic parameters. The act of choosing the right values for these heuristic parametersotherwise can itself become a tedious and demanding calibration process. This calibration of solution parametersmay be required as solution method parameters might not be transferable between different projects. Since aright choice of parameters might for example depend on the strategic traffic model or the objective function used.

iii): Robustness under initial solution guess:

One solution method parameter that most solution methods for optimization problems have in common isthat of an initial solution guess. It is desirable to have a solution method that finds a quality solution (in termsof (i)) independent of the initial solution guess.

iv): Running time:

The time it takes for the method to converge to its final solution. With the hillclimbing method of the oldcalibration approach the largest scale strategic traffic model calibrated within Goudappel Coffeng takes aroundthree days to calibrate all purposes. As the new calibration that we will be using significantly reduces therequired number of IPF iterations it is reasonable to require the new solution method to stay within this timewindow for this reference model. The running time is, except from staying within this reasonable time limit,deemed less important than criteria (i) to (iii).

v): Difficulty of implementation:

Here important aspects are the work it takes to implement the method in general but also specifically withinOmnitrans, as well as whether an implementation of the method would be flexible to changes in the calibrationenvironment such as changes in the trip distribution model (e.g. additional balancing steps) or the objectivefunction.

36

5 Potential solution methods

This chapter describes potential candidate methods for solving the optimization problem of (16). First in section5.1 the original hillclimbing algorithm is described which was used in the old calibration approach. This is seenas the reference algorithm to which selected potential solution methods will be compared and potentially testedagainst on the performance criteria of section 4.4. Section 5.2 describes the general gradient based update ruleand two methods to approximate the gradient used in this rule. Section 5.3 describes an alternative gradientbased method, the so called SPSA method, which approximates the gradient stochastically. Section 5.4 discussestwo potential techniques for global optimization. Finally section 5.5 compares the discussed potential solutionmethods after which an alternative candidate method to hillclimbing is selected to be further explored andtested in the next chapters.

5.1 Hillclimbing

The hillclimbing algorithm we now describe is largely the same as the hillclimbing algorithm used for the so-lution method of the original calibration approach by Fransen[7]. The variant we describe here is adjusted forthe new calibration approach in which the α set are balancing factors. It is a trimmed down version in whichonly the β parameter set is updated. The rule used has not changed but we describe it shortly for the sake ofcompletion and since it is one of the solution methods that was implemented and tested.

The hillclimbing algorithm iteratively updates the β parameter set starting from an initial set β0. In whatfollows n denotes the current iteration number. The algorithm stops doing iterations when either a predefinedmaximum number of iterations is reached or if either the current objective value F (βn) or change in objectivevalue F (βn)−F (βn−1) is smaller than a preset threshold value. For each iteration n the algorithm keeps trackof the individual objective values fmu(βn) per mode-user class pair (m,u).

Updating the β set

In addition to tracking individual objective values, each mode-user class pair (m,u) has a variable signmuwhich always equal either 1 or -1, each of which are initially set at either of these values. The signmu variabledetermines whether the next update increases or decreases βmu. The algorithm also keeps track of a step sizevariables stepmu which then determine the absolute value by which βmu is changed in the next update, initiallypreset to some maximum step size. The final parameter that requires a value is the step size reduction factorγ > 1. Now the hillclimbing algorithm we used can be described by algorithm 4.

37

Algorithm 4 Adjusted hillclimbing algorithm

1: for each mode-user class pair (m,u) do2: signmu ← initialsignmu3: stepmu ← maxstep4: end for5: Run gravity model and calculate individual objective values fmu(β0)6: for each mode-user class pair (m,u) do7: β1

mu ← P<0(β0mu + signmu · stepmu)

8: end for9: Run gravity model and calculate individual objective values fmu(β1)

10: n← 111: while n ≤ maxiterations & F(βn) ≥ thresholdobj & F(βn)− F (βn−1) ≥ thresholdchange do12: for each mode-user class pair (m,u) do13: if fmu(βn) ≥ fmu(βn−1) then14: signmu ← −signmu15: stepmu ← 1

γ · stepmu16: end if17: βn+1

mu ← P<0(βnmu + signmu · stepmu)18: end for19: Run gravity model and calculate individual objective values fmu(βn+1)20: n← n+ 121: end while

In algorithm 4 the function P<0 projects for a solution βk each nonnegative βkm into a feasible negativecoordinate by:

(P<0(β))i =

{βi, βi ≤ βmaxβmax, βi > βmax

Here βmax should be chosen as a negative number close but not equal to zero, as zero itself is not allowed.

The intuition behind hillclimbing is that it checks each iteration whether the last movement of βmu (increas-ing or decreasing) has been profitable in terms of its individual objective value fmu(βn). Each time the lastmovement of a parameter was not profitable it reverses its direction and it reduces its step size geometricallyby applying 1

γ as a reduction factor.

One special case in the hillclimbing algorithm which algorithm 4 does not include, is the following: If theinitial movement for a parameter βmu was not profitable it is then set to the value it would be at now given theinitial sign was reversed. In addition its sign is negated while the step size is not changed from the maximuminitial step size. The reason for this step is simply to quickly correct a perhaps badly chosen initial sign value.

Before this research a heuristic rule of setting the initial signmu values was used by basing them on themodelled trip length distributions of the initial β0 guess. The observed and modelled proportion of short tripswere compared. Here short bins are defined as the trips of the first nshort bins. This rule was omitted inthis research as it did not seem to make a difference in earlier application and it requires a choice for nshort,another method parameter, which is not entirely obvious. Also it seems like the special case rule mentionedabove should correct already for a wrong starting course. This insensitivity to the initial sign values was alsoobserved in earlier application. The initial step size maxstep was observed to be a more sensitive parameter.

The step size reduction factor γ was set to the golden ratio 1+√5

2 in this thesis, as was the case before this

38

research. This seems like a reasonable choice, as it is often found in optimization such as in golden-section search.

An advantage of the hillclimbing method is that the computational time it takes to do one hillclimbing it-eration is practically speaking equal to the time it takes to evaluate one objective function. So in a sense thetime hillclimbing spends outside of exploring the solution space is negligible (even for a model of small sizethe time spent on running the IPF procedures is much larger than the time spent on the remaining lines inalgorithm 4).

A disadvantage of the method is that as there is no proof of convergence of the iterates to a local minimum.In fact in practice it was frequently observed for the old calibration approach that the objective value increasessometimes multiple iterations before improving. The method seems heuristic in nature. It changes the βmu bylooking at changes in individual objective values fmu(β) as if only the change in an individual parameter βmueffects its individual objective value. So hillclimbing does not take into account the cross correlation that mightexist between a change in parameters and the resulting change in total objective value.

5.2 Gradient based methods

The steepest descent method aims to find a local minimum by updating the parameters in the direction ofsteepest descent, which is the negative of the gradient of the objective function F (β) : Rn → R. The gradientof F with respect to β is a vector containing the partial derivatives of F with respect to the βmu’s:

∇F (β) =

(∂F (β)

∂βmu, . . . , ∀m,u

)TThe gradient descent method then is characterized by the following update rule:

βk+1 = βk − ak∇F (βk)

Where ak > 0 is the step size taken in negative gradient direction for which a suitable value has to be chosen.A step size that is too small would mean a slow convergence towards the local minimum while a constant stepsize that is too large can potentially lead to constantly overshooting the local optimum. The ak does not haveto constant but could for example become smaller over the iterations e.g.:

a = ak =a0k

Another possibility to determine the step size is to use a line search algorithm. A line search algorithmtries for a calculated gradient different solutions by taking different step sizes along the negative gradient searchdirection until sufficient improvement is found. However depending on the type of line search this can makea descent iteration a lot more costly in terms of computational time. Next we describe two approaches tocalculating the gradient ∇F (β). We now discuss two seemingly suitable methods to calculate an approximationto the gradient. We refer to Martins[10] for a more through discussion on these and other methods to calculategradients.

5.2.1 Approximating the gradient: finite differences

Here for a more convenient notation we represent the parameter set now as a vector β ∈ Rn<0 with its componentsassumed to be n in number individual parameters denoting βi, i = 1, . . . , n. The method of finite differences isa popular and easily implemented method for approximating the gradient of a function. One option is to usethe so called central difference approximation:

dF (β)

dβi≈ F (β + ∆ei)− F (β −∆ei)

2∆, ∀i = 1, . . . , n (18)

39

For ∆ > 0 sufficiently small. Here ei denotes the ith unit vector in Rn. Using forward differences givesa cheaper (n + 1 objective function evaluations compared to 2n) but less precise method to approximate thegradient:

dF (β)

dβi≈ F (β + ∆ei)− F (β)

∆, ∀i = 1, . . . , n (19)

For a sufficiently smooth function the forward difference approximation error is O(∆). The central dif-ference is more precise with an approximation error of O(∆2). Both central and forward methods require asuitable finite difference value ∆ to be chosen. The difference interval must be chosen small enough to give agood approximation to the derivative, however on the other hand a value that is too small would give a badapproximation due to rounding errors (as a computer has finite numerical precision).

5.2.2 Calculating the gradient analytically

The following technique of calculating an analytical gradient of the objective function was introduced to theauthor by his supervisor Dickinson[6]. For the sake of clarity we describe the technique here for a trip distributionmodel consisting of a single gravity model (sub purpose) with no distinction in user classes.As was already shown in section 4.2, we can regard the balancing factors as a function of β i.e. we can writeO(β), D(β) and α(β). We too saw that the objective function can be regarded as a function of β and thebalancing factors:

F (β,O(β),D(β),α(β))

By applying the multivariable chain rule, the total derivative of the objective function F with respect to βmcan be expressed as:

dF

dβm=

∂F

∂βm+∑i

∂F

∂Oi

dOidβm

+∑j

∂F

∂Dj

dDj

dβm+∑m

∂F

∂αm

dαmdβm

(20)

Here and in what follows the partial derivatives should be interpreted as being evaluated in the balancingfactors and parameter set corresponding to the current iteration. In (20) the partial derivatives of the objectivefunction F with respect to βm i.e. ∂F

∂βmas well as the partial derivatives of the objective function F with

respect to the balancing factors ∂F∂Oi

, ∂F∂Dj

and ∂F∂αm

can be computed directly. In what follows we show how the

remaining factors in (20) which are the derivatives of the balancing factors with respect to βmu can be foundby solving a linear system.

First we define residual functions for each of the trip-end constraints gprodi , gattrj and gMSm for respectively

the production, attraction and modal split constraints by:

gprodi =∑j,m

tijm − Pi ∀i

gattrj =∑i,m

tijm −Aj ∀j

gMSm = MSm − MSm ∀m

Note that these residual functions are similar to the objective function F functions of the calibration pa-rameters as well as the balancing factors. Given the IPF procedure has converged sufficiently the residuals areapproximately zero, i.e. the following system of equations holds approximately:

40

gprodi (β,O(β),D(β),α(β)) = 0, ∀i

gattrj (β,O(β),D(β),α(β)) = 0, ∀j

gMSm (β,O(β),D(β),α(β)) = 0, ∀m

Differentiating each of these equations with respect to βm by again using the multivariable chain rule weobtain:

dgprodi

dβm=∂gprodi

∂βm+∑i

∂gprodi

∂Oi

dOidβm

+∑j

∂gprodi

∂Dj

dDj

dβm+∑m

∂gprodi

∂αm

dαmdβm

= 0, ∀i (21)

dgattrj

dβm=∂gattrj

∂βm+∑i

∂gattrj

∂Oi

dOidβm

+∑j

∂gattrj

∂Dj

dDj

dβm+∑m

∂gattrj

∂αm

dαmdβm

= 0, ∀j

dgMSm

dβm=∂gMS

m

∂βm+∑i

∂gMSm

∂Oi

dOidβm

+∑j

∂gMSm

∂Dj

dDj

dβm+∑m

∂gMSm

∂αm

dαmdβm

= 0, ∀m

Defining vectors x and b by:

x =

(dOidβm

, . . . ,dDj

dβm, . . . ,

dαmdβm

, . . .

)T

b = −

(∂gprodi

∂βm, . . . ,

∂gattrj

∂βm, . . . ,

∂gMSm

∂βm, . . .

)TThen the equations in (21) are equivalent to the linear system Ax = b. The matrix A then is the Jacobian

of the vector valued function g with respect to the balancing factors (O,D,α). Note that since the numberof trip-end constraints is equal to the number of balancing factors, A is a square matrix. The matrix A andvector b can be computed directly. The solution to the linear system x gives us the desired derivatives of thebalancing factors with respect to βm which can then be plugged into (20) to compute the total derivative dF

dβm.

By repeating this technique for each parameter βm the objective function gradient ∇F (β) can be calculated.However like the finite difference method this analytical approach also at best approximates the actual gradient.This is due to the fact that practically speaking the IPF procedure has inexact convergence and so the residualfunctions only approximate zero. Therefore it seems helpful to use the gradients calculated by the finite differ-ence methods as a reference when testing this analytical gradient method. The running time of these gradientcalculation methods is also an interesting aspect to be investigated. Should both methods give similar gradientsthe method that calculates it in a shorter time period will be preferred. The ranking of the methods in terms ofrunning time might vary depending on the size of the trip distribution model (in terms of the number of zones).

5.2.3 Adjoint gradient method

This section discusses the adjoint gradient method which is a more efficient variant of the analytical gradientmethod of section 5.2.2. Again for sake of clarity we assume the trip distribution model consists of a single subpurpose and we omit user classes. Now denote by c the vector of partials of the objective function with respectto the balancing factors i.e.:

41

cT =

(∂F

∂Oi, . . . ,

∂F

∂Dj, . . . ,

∂F

∂αm, . . .

)TAlso let d(m) = ∂F

∂βmand let A, b(m) and x be defined again as in section 5.2.2. Then the method described

there for calculating a parameter sensitivity dFdβm

can be summarized by:

dF

dβm= d(m) + cTx(m)

Where each x(m) solves the linear system:

Ax(m) = b(m)

Or put in one equation:

dF

dβm= d(m) + cTA−1b(m) (22)

In particular calculating the full gradient by this method requires solving the system Ax(m) = b(m) foreach mode m. In our case it is more efficient to use the so called adjoint method to calculate the gradient.The adjoint gradient method is also described in Martins[10]. Instead of solving multiple linear systems it isalso sufficient for calculating the gradient to solve just one linear system: letting y = cTA−1 equation (22) isequivalent to:

dF

dβm= d(m) + yT b(m) (23)

Then to compute y requires solving the linear system with adjoint matrix AT and cT as a right-hand sidevector, so y solves the linear system:

ATyT = cT

5.3 Simultaneous perturbation stochastic approximation (SPSA)

Here we give a short overview of the method of Simultaneous perturbation stochastic approximation, abbreviatedSPSA, in which the descent vectors stochastically approximate the gradient. For a more in depth presentationof the method we refer to Spall[13].

The update rule of SPSA is similar to the gradient descent method:

βk+1 = βk − ak∇F (βk)

The vector ∇F (β) approximates the gradient and is obtained from the average of a set of m vectors gl(β,∆l):

∇F (β) =

∑ml=1 g

l(β,∆l)

m(24)

These vectors gl(β,∆l), l = 1, . . . ,m are calculated component wise by:

gli(β,∆l) =

F (β + ck∆l)− F (β − ck∆l)

2ck∆li

, i = 1, . . . , n (25)

42

For each l, ∆l is a so called perturbation vector each consisting of components ∆li that are drawn from a

Bernoulli ± distribution:

∆li =

{1, with probability 1

2

−1, with probability 12

∀i = 1, . . . , n

Each ∆l perturbs β once along its direction and once along its opposite direction in (25). The magnitudeof each perturbation is ck in iteration k. The magnitudes are of the form:

ck =C

kγ

The step sizes ak are of the form:

ak =a

(A+ k)α

Note that so far we require a specification of the all positive constants C, γ, a, A, α and m mentioned sofar. The following conditions are necessary by Spall:

α− 2γ > 3γ − α

2≥ 0

The advantage that comes with the SPSA method is that it needs less function evaluations compared tothe finite difference methods to get an approximation to the gradient that is at least as good. To clarify this:note that for m = 1 each iteration we would according to (24) need only two function evaluations compared to2n+1 and n+1 respectively for the central and forward finite difference methods equations (18) and (19). ThatSPSA gives at least as good of an approximation to the gradient as using central difference approximations weget by the following lemma with proof given by Spall[13]:

For F three times continuously differentiable in some neighborhood of β, we have for the expectation of gl(β,∆l)for c → 0 :

E[gl(β,∆l)−∇F (β)] = O(c2)

5.4 Global optimization techniques

The solution methods described so far are at best local minimizers. In this subsection we shortly describe twooptimization techniques that can be used to approximate the global optimum. The first one is built from agiven local search method. The second technique is called simulated annealing.

5.4.1 Multi-start methods

Given an algorithm that can find a local minimum one can generate different starting parameter vectors βj0, oneach of which a local search is performed to obtain different local minima. If we indefinitely do local searchesfrom new starting β0, we approximate the global minimum. Techniques of this kind are called multi-startmethods. Palomares et. al[8] mentions multi-start methods one of which partitions the search space into boxeswithin each a initial guess is randomly selected and from which then local search is started.

5.4.2 Simulated annealing

Simulated annealing is a probabilistic technique for approximating the global minimum of a function. Themethod of simulated annealing was first introduced in combinatorial optimization but algorithms have beenderived from simulated annealing for optimizing functions of continuous variables. For an example of such an

43

algorithm and a more comprehensive discussion on simulated annealing in the context of continuous optimiza-tion we refer to Corana et al[4]. Here we briefly discuss the essential idea in simulated annealing which isthat simulated annealing will accept a solution with a worse objective value with a certain positive probabilityallowing for a more explorative search for the global optimum in the solution space.

In simulated annealing a new trial solution β′

is constructed by perturbing the current solution βk at iter-ation k in some random fashion. If the trial is an improvement i.e. F (β

′) < F (βk) we will immediately accept

it as our new solution: βk+1 = β′. In case the trial is worse we will accept it with a probability P which

decreases as the worsening gets larger or the temperature at iteration k denoted T k decreases:

P (βk,β′, T k) = e

−(F (β′)−F (βk))

Tk

The algorithm starts at some high temperature T0 and during the algorithm it is cooled down so that overtime the probability to accept a worsening solution decreases.

5.5 Comparison of potential solution methods

We decided to select both the finite differences and analytical gradient descent methods from the potentialsolution methods described so far as a method to be actually implemented and tested against the hillclimbingmethod which serves as a reference. In terms of the performance criteria gradient descent based methods seem tocomplement hillclimbing: gradient descent seems slower computationally but trading this for being potentiallymore reliable in improving per iteration (reliability of finding a quality solution) compared to hillclimbing.

The main reason Stochastic simultaneous perturbation approximation (SPSA) and Simulated annealing arenot further investigated is because they have a lot of solution method parameters and seem much more compli-cated and heuristic compared to gradient descent. Another reason we did not choose SPSA is that we do nothave a lot parameters to calibrate (only 6) and SPSA usually is applied in case one wants to calibrate manyparameters as then SPSA becomes especially advantageous relative to a finite differences approach. Overallgradient descent seems to be a more logical choice to go with as a first alternative method to hillclimbing.

Table 2 summarizes the preliminary assessment of the proposed solution methods in terms of the performancecriteria established in section 4.4. Note that columns are missing for both the Reliability of finding a qualitysolution and Robustness under initial solution guess criteria because it is difficult to assess the methods on theseat this stage. The selected methods, so both hillclimbing and gradient descent, will need to be implementedand tested on actual models before coming back to these criteria. The Minimizer type column in table 2 didnot appear as a criterium in section 4.4, but ties in with the Reliability of finding a quality solution criteriumsince it aims to classify what type of solution each algorithm at best finds.

44

Per

form

ance

crit

eria

Sol

uti

onm

ethod

Min

imiz

erty

pe

#so

luti

on

met

hod

Runn

ing

tim

e(e

xp

ress

edD

ifficu

lty

of

par

am

eter

sin

#of

IPF

runs

per

iter

ati

on)

imple

men

tati

on

Hillc

lim

bin

g:

Loca

lm

inim

izer

Moder

ate

1E

asy

grad

ient

des

cent

(finit

ediff

eren

ces)

:L

oca

lm

inim

izer

Few

2n+

1(c

entr

al)

orn

+1

(forw

ard

)E

asy

grad

ient

des

cent

(an

alyti

cal)

:L

oca

lm

inim

izer

Few

dep

enden

ton

#of

zones

Hard

SP

SA

:Sto

chas

tic

loca

lm

inim

izer

Many

2*(

#of

per

turb

ati

ons

(m))

Easy

Sim

ula

ted

ann

ealing:

Sto

chas

tic

glo

bal

min

imiz

erM

any

1E

asy

Ta

ble

2:

Pre

lim

ina

rya

sses

smen

to

fth

epo

ten

tia

lso

luti

on

met

hod

so

nso

me

of

the

perf

orm

an

cecr

iter

iao

fse

ctio

n4

.4

45

6 BFGS method

This chapter describes a BFGS method that we implemented and tested for the calibration problem. Section6.1 gives a short discussion of line search methods. Then in section 6.2 the particular Quasi Newton methodthat was implemented for the calibration problem, which is a BFGS method, is described.

6.1 Line search approach

In optimization, line search is a basic iterative approaches to find a local minimum of an objective function.The line search approach first finds a descent direction along which the objective function will be reduced andthen computes a step size determining how far the solution should move along that direction to find the nextiterate. The descent direction can be computed by various methods, such as the method of steepest descent,Newton’s method and the Quasi-Newton method. In section 5.2 we mentioned the steepest descent methodwhich uses only gradients to update the solution. Newton’s method also uses second order derivatives of theobjective function in the form of the Hessian matrix H. The Hessian matrix Hk for the kth iterate βk is defined

in terms of the second order derivatives by Hij = ∂2F (βk)∂βi∂βj

. The update rule for Newton’s method then is given

by:βk+1 = βk − stepk · [Hk]−1∇F (βk) (26)

Newton’s method usually leads to a much faster convergence to a local minimum (quadratic instead of alinear rate of convergence) but requires calculation of the Hessian and solving a linear system which can beexpensive. In our case calculating the Hessian by a finite difference approach would likely require too manyfunction evaluations, however solving the linear system part would be effortless due to low dimensionality.

Quasi-Newton methods use the same update rule as (26). However instead of calculating the actual Hessian thematrix H in the update rule is an approximation to the Hessian that is updated from the sampled gradientsand function values throughout the iterations. The Quasi-Newton methods are in between steepest descent andNewton’s method in terms of convergence rate with a superlinear convergence rate (in case of convergence).The extra time required for Quasi-Newton compared to steepest descent is mainly in solving a linear systemin (26) and is negligible again due to low dimensionality of the calibration problem. The promise of betterconvergence properties with no added running time increase motivated us to also implement a Quasi-Newtonline method. Unlike in the case of convex problems, proofs of convergence for Quasi-Newton methods are notknown for general nonlinear problems. However Quasi-Newton methods are still a popular choice for nonlinearoptimization problems.

46

6.2 BFGS method implementation details

Algorithm 5 Projected bound-constrained BFGS with damped update.

Solution method parameters:Relative convergence treshold: ε ∈ (0, 1),Step size reduction factor: µ ∈ (0, 1),Sufficient decrease parameter: c1 ∈ (0, 1),Maximum number of iterations: maxit ∈ N,Maximum value of β: βmax < 0

1: F 0 ← F (β0)2: g0 ← ∇F (β0)3: ε0 ← ε · ‖g0‖24: H0 ← ‖g0‖2 · I5: Converged ← FALSE6: k ← 0

7: while Converged = FALSE do8: Sk ←H−1k9: Calculate the first index set Ik1 by 29

10: Calculate Sk by 3011: Calculate the second index set Ik2 by 3112: Ik = Ik1 ∪ Ik213: Calculate Sk by 32

14: pk ← −Sk · gk15: step← 116: βk+1 ← P<0(βk + step · pk)17: F k+1 ← F (βk+1)18: while F k+1 > F k − c1 · step · pTk gk do19: step← µ · step20: βk+1 ← P<0(βk + step · pk)21: F k+1 ← F (βk+1)22: end while23: gk+1 ← ∇F (βk+1)24: s← βk+1 − βk25: y ← gk+1 − gk26: if sTy ≥ 0.2 · sTHks then θ ← 1 else θ ← 0.8 · sTHks

sTHks−sTy27: y ← θ · y + (1− θ) ·Hks

28: Hk+1 ←Hk + yyT

yT s− HkssTHk

sTHks

29: if ‖gk+1‖2 < ε0 OR k > maxit then (with gk+1 defined by (33))30: Converged ← TRUE31: end if32: k ← k + 133: end while

The method described in algorithm 5 for the most part is the basic BFGS (named after its inventors Broyden,Fletcher, Goldfarb and Shanno) Quasi-Newton method which is described in the classic optimization book of

47

Nocedal and Wright[16]. It is characterized by the Hessian approximation update rule given by:

Hk+1 ←Hk +yyT

yTs− H

kssTHk

sTHks(27)

Other Quasi-Newton methods exist and are characterized by different update rules. We have chosen theBFGS update rule as it is the most popular one and is believed to be the most efficient among the existingQuasi-Newton methods.

6.2.1 Initial Hessian approximation choice

All Quasi-Newton methods require an initial Hessian approximation H0. The most simple choice is the identitymatrix I which meets the only requirement of being positive definite. In this case the first iteration of BFGSwould be a steepest descent iteration, as the search direction p0 would equal the negative initial gradient −g0,by line 8 of algorithm 5. Nocedal and Wright[16] presents two other suggestions that scale the identity matrix

by a positive constant. The first option scales the identity matrix by ‖g0‖2σ . Here σ is a solution parameter and

has the effect of scaling the norm of the initial step to σ. The other option uses the scaling factor yT syTy

after thefirst step is computed but before the BFGS update. This option has, compared with the first option, a moresound mathematical motivation as it tries to make the size of H0 equal to the true Hessian in a sense explainedby Nocedal and Wright. However we tested both scaling options for small synthetic models and the first optionwith σ set to 1 was observed to be slightly favourable in terms of the time till convergence, and was thereforeused in the final implementation.

6.2.2 Choice of step size rule

Given we have computed a search direction pk an appropriate step size to take along this take direction has tobe computed. The Armijo step size rule makes sure the Armijo condition is satisfied. The Armijo condition isalso called the sufficient decrease condition and requires:

F k+1 ≤ F k + c1 · step · pTk gk (28)

Here c1 ∈ (0, 1) is a solution parameter. The Armijo rule is also called backtracking line search. From aninitial step size it iteratively reduces the step size by a reduction factor µ ∈ (0, 1) until the Armijo condition isenforced. A search direction pk is defined to be a descent direction if there exists a step size λ > 0 such thatthe current solution is improved with this step size along the direction i.e. F (βk + λ · pk) < F (βk). Given thatthe search direction pk is a descent direction clearly the backtracking procedure will terminate at some step sizesatisfying the sufficient decrease condition as the step size step in (28) will decrease and make the condition easierto be satisfied. For pk in the BFGS method to be a descent direction requires that the Hessian approximationHk remains positive definite i.e. the update formula (27) ensures positive definiteness. In case the so calledcurvature condition holds, which is sTy > 0, positive definiteness is implied. This curvature condition holds fora strictly convex objective function automatically, however our problem is nonconvex. Therefore the curvaturecondition needs to be enforced by the step size algorithm. A popular choice is to enforce the so called Wolfeconditions which consists of both the sufficient decrease condition in (28) and the following curvature condition,which implies that sTy > 0:

∇F (βk + stepk · pk)Tpk ≥ c2 · gTk pkWhere c2 satisfying 0 < c1 < c2 < 1 is a solution parameter. A potential disadvantage of a line search

enforcing this condition is that multiple gradients ∇F (βk + stepk · pk) might be required to be calculated forthe different step size candidates during one iteration. Because the Armijo rule with its 1 gradient calculationper line search seemed cheap computationally we implemented the Armijo step size rule. Another reason is

48

that the Armijo step size rule was also used in Kim et al.[9] which we used as a reference for dealing with thenegativity constraints. However a drawback is that the Armijo rule does not enforce the curvature condition soadditional measures have to be taken.

Damped Hessian update rule

The first option discussed by Nocedal and Wright[16] to prevent the Hessian approximation from becomingnon positive definite is to skip the update in case sTy ≤ 0 is observed i.e. set Hk+1 = Hk. However skippinghas the risk of not including important curvature information. Therefore they suggest a more effective alter-native which is to use a damped update. The damped update is done via the normal BFGS formula (27) butensures positive definiteness of Hk+1 by modifying the definition of y in case sTy is not sufficiently large. Theexact implementation of the damping rule is described in lines 26 and 27 of algorithm 5.

6.2.3 Negativity constraints

The calibration problem we are trying to solve is a nonlinear bound-constrained optimization problem. Thebound-constraints here are the negativity constraints on the β’s. To deal with the bound-constraints we haveused the projected quasi-Newton approach discussed in Kim et al.[9]. We now describe the implementationdetails related to the bound-constraints of algorithm 5.

Before each line search, the variables are partitioned into fixed and free variables. These fixed variables areleft unchanged in the optimization, while only the free ones are updated and are considered in the (reduced)Hessian approximation for scaling the gradient. The fixed variables are a subset of the active variables whichare variables at which the corresponding bound-constraint hold with equality, in our case: the variable βki isactive at iteration k if βki = βmax. For an active variable to be fixed requires either the corresponding gradiententry to be negative or a scaled gradient based on second order information to be negative. In particular, wecompute two sets Ik1 and Ik2 at iteration k. The first index set, also called the binding set, is calculated as:

Ik1 = {i | βki = βmax and gki < 0} (29)

Then the matrix Sk is obtained from Sk (which is the inverse of the current iterations Hessian approximationi.e. Sk = H−1k , see line 8) by vanishing all rows and columns corresponding with the first index set or bindingset Ik1 :

Skij =

{Skij , i, j /∈ Ik10, else

(30)

The second index set Ik2 consists of variables for which the corresponding entry of the gradient scaled by Sk

is negative:

Ik2 = {i | βki = βmax and (Skgk)i < 0} (31)

Then the matrix Sk is obtained from Sk by vanishing all rows and columns of indices in the fixed setIk = Ik1 ∪ Ik2 :

Skij =

{Skij , i, j /∈ Ik

0, else(32)

The search direction is obtained by scaling the (negative) gradient by this matrix Sk in line 14 of algorithm 5.

49

Optimization convergence criterion

The final detail in which the projected BFGS method is different from the basic non projected frameworkis in its convergence criterion. The convergence criterion considers a reduced gradient gk in line 29 of algorithm5 so that only the free parameters are counted in calculating the gradients norm:

gki =

{gki , i /∈ Ik

0, else(33)

The convergence criterion we used considers the relative improvement from the initial gradient norm. Thethreshold ε ∈ (0, 1) parameter sets the fraction of the initial gradient norm below which the gradient shouldvanish. The relative criterion gives an uniform threshold for different choices of weighting in the objectivefunction.

50

7 Results

In this chapter we describe the results of the tests that were performed. The trip distribution models as describedin chapter 2 and 3, the calibration methods of hillclimbing, steepest descent described in chapter 5 and theBFGS method described in chapter 6, including the gradient calculation routines of finite differences and theadjoint method were all implemented in MATLAB. Test results pertaining to the calculation of the gradientare discussed in section 7.1. In section 7.2 results of the calibration using the different methods on a mediumscale and large scale strategic traffic model are presented.

7.1 Calculating the gradient

We calculated gradients using both the finite difference approach of section 5.2.1 and the adjoint analyticalapproach of section 5.2.3. It is interesting to consider how accurate both methods are and whether theycompute approximately the same gradient. Also it is interesting to look at the methods’ running times. Weexplore these aspects in section 7.1.1 which is on the accuracy of the methods and section 7.1.2 which on thecomputational effort of the methods.

7.1.1 Accuracy

We compared the accuracy of the finite difference approach and analytical approach to calculating gradientson the BBMA Work trip distribution model. The number of iterations that is done inside the gravity modelinfluences the gradient that is computed for both approaches. For the finite differences approach choosing alower number of iterations will alter the trip distribution obtained and so the objective values computed. Thisimpacts the gradient calculated. For the analytical approach the trip-end residuals are assumed to be zero (toobtain the linear system in (21)). This brings about an inaccuracy in the calculation of the gradient too. Itseems reasonable that given the number of iterations is high enough both gradient approaches should convergeto the same gradient vector i.e. the relative norm between them should become small. The difference betweenforward and central differences was observed to be negligible. Therefore we chose to use the forward differencesapproach requiring n+ 1 objective function evaluations instead of central difference’s 2n. For the perturbationsize we set ∆ = 10−8 in 19. The results are shown in Figure 13. Table 3 shows the data points for the lowernumber of iterations of figure 13. After 25 iterations the relative error between the two gradients is 1%.

51

0 100 200 300 400 500 600 700 800 900 1000# of iterations inside gravity model

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

Figure 13

# of iterations gravity model: 5 10 15 20 25 30 45

Relative norm: 0.15 0.07 0.03 0.02 0.01 0.007 0.001

Table 3: Relative norm for low number of iterations in figure 13.

Singular Jacobians

It was observed occasionally that the linear the system of (21) was singular. This is equivalent to the cor-responding Jacobian matrix A not having an inverse. In case a matrix is singular either the correspondingsystem admits zero solutions or infinitely many. This first option was observed to be the case. A possibleexplanation is that the coefficients and equations that are involved in the analytical gradient calculation areinconsistent with the coefficients and equations of the gravity model due to an error or inaccuracy in the MAT-LAB implementation. Because of this difficulty we decided to compute gradients by the forward finite differenceapproach instead of using the analytical method in the steepest descent and BFGS method. We showed howboth gradient routines compute the same gradient. So using finite differences should not make a difference inthe test results other than different running times.

52

7.1.2 Computational effort

We observed for the two methods for different model sizes n how much time it takes to compute the gradient.For each number of zones n a trip distribution model with n zones was obtained by resizing and interpolatingthe cost matrices, production- and attraction vectors of the BBMA Work purpose model of 1425 zones forthe various n. Specifically we used MATLAB’s imresize function to rescale the matrices and vectors and used’bilinear interpolation’ as the method parameter.

The gravity convergence criterion used in this test but also in what follows for the rest of this thesis is torequire Rrelprod and Rrelattr in (17) to be both less than 5%. Let tfd(n) and tal(n) be the time needed to computea gradient with the finite difference approach and the analytical approach respectively. Then we have for theapproach of finite differences (assuming 6 parameters βmu, which is the case for our models of interest in thisthesis) that:

tfd(n) = 6 · tgravityrun(n) = 6 ·#it(n) · tit(n)

With tgravityrun(n) the observed running time to calculate one gravity run, which is simply the number ofgravity model iterations #it(n) multiplied by the time to compute one iteration tit(n). The convergence ofthe gravity model i.e. #it(n) was observed to vary quite significantly with n. Here we wanted to study therelationship between model size n and the running times of both methods. To get this relationship more clearlywithout the gravity model convergence impact we normalized the iteration numbers #it(n) to the same number15. The results after this normalization step are shown in figure 14. The figure shows the factor by which the

analytical gradient approach is faster than the finite difference approach i.e.tfd(n)tal(n)

. The constant red line of

factor 1 is drawn to indicate the turnover point (which is at the intersection of the blue and red line) at which theanalytical method becomes more advantageous than the finite differences approach. The analytical approachis significantly faster for models of size smaller than 3300 zones. If one wants to compare the competitivenessof the methods based on these results in case convergence of the gravity model requires a different number ofiterations say nx, one can simply multiply the factors by nx

15 .

Running time analysis: big O notation

The results on running time presented in figure 14 are not unexpected from an Big O notation analysis stand-point. The finite difference approach boils down to running the IPF procedure multiple times which is essentiallyrepeated scaling of all rows and columns. Therefore the finite difference approach has a running time of O(n2).The analytical gradient approach at least requires us to solve a linear system of size 3n+ 6. In general solvinga linear system of such a size takes O(n3). This confirms again that the finite differences approach should befaster than the analytical approach for large n.

53

0 500 1000 1500 2000 2500 3000 3500 4000 4500

# of zones

0

1

2

3

4

5

6

7

8

9#

of ti

mes

fast

er# of times it is faster to calculate the gradient analytically rather than by finite differences

Figure 14: Observed factor by which calculating the gradient analytically is faster than computing it by finite differencesfor varying number of zones (assuming gravity model convergence in 15 iterations per sub purpose)

54

7.2 Calibration results

Calibration runs were done using the hillclimbing algorithm, a steepest descent and the BFGS method of chapter6 on two models: the medium scale BBMA strategic traffic model and the large scale MRDH strategic trafficmodel. The main goal of these tests was to investigate the performance of the hillclimbing algorithm and theBFGS method in terms of the (i) reliability in finding a quality solution, (iii) robustness under initial solutionguess and (iv): Running time criteria, as discussed in section 4.4. For the reliability in finding a quality solutioncriterium we have the goal to answer the following questions: What is the nature of the solution each of themethods converge to? Is it a global minimum, local minimum or perhaps not even a local minimum and thussuboptimal. An important question that ties in with the robustness under initial solution guess is whether ornot there exist multiple local minima in the solution space. Finally the running times of both methods are ofinterest.

7.2.1 Calibration of the medium scale BBMA strategic traffic model

The first model we calibrated is the medium scale BBMA model of 1425 zones. The trip distribution modelfor the BBMA model is similar in structure to the MRDH model which is discussed in the next section. Bothhave a stratification into purposes and sub purposes as summarized in table 1. The advantage of testing on theBBMA model is that we are able to test more initial solution guesses in the same time period due to its smallersize (1425 versus MRDH model’s 7786 zones) the running time of both our methods is much faster. This isuseful in obtaining data to answer the questions surrounding the reliability in finding a quality solution androbustness under initial solution guess criteria.

As no count numbers were available for the BBMA model, no confidence intervals could be constructed, there-fore the calibration objective function was taken to be (15), so neither normalization by bin width or confidenceinterval width was applied here.

For each purpose 16 different starting solutions were randomly selected, where β0mu ∈ [−1, 0] for each of the 6

mode-user class combinations (m,u). We believe that the initial guesses can be constrained to this cube afterobserving many bad objective values corresponding to solutions outside the cube in an experiment preliminaryto this test. Both the hillclimbing and a steepest descent algorithm were, for each purpose, started from thesame set of 16 β’s.

Solution method setup

For the hillclimbing algorithm, the initial step size maxstep, was set to 0.5, as this value was used for theBBMA model calibration before this research. The initial signs signmu were all set to 1 and the step size

reduction factor γ was set to the golden ratio 1+√5

2 . For the hillclimbing convergence criterium we simplyiterated for maxiterations = 40 iterations. Each run this number was observed to be more than necessary forconvergence.

We mentioned that we used a steepest descent algorithm instead of the BFGS method for the BBMA model.The reason for this is that initially we started by implementing the steepest descent for the sake of it being themost simple and straightforward type of line search algorithm. The BFGS method was implemented only ina later phase in hopes of improving the convergence rate. However the results of the BBMA tests employingsteepest descent were sufficient to answer the earlier mentioned questions about whether there are multiple localminima and whether the hillclimbing algorithm finds a local minimum.

The details of this steepest descent method then are as follows: we used a simple backtracking line searchwith the initial step size set to 0.25 and the step size reduction factor set to 1

2 . The sufficient decrease param-

55

eter (see (28)) was chosen to be c1 = 10−4, which is often used in backtracking line search in practice[16]. Forthe convergence criterium the maximum number of iterations was set to maxit = 40 iterations, which each timeproved to be more than necessary to reduce the gradients norm below 0.01, the absolute convergence tresholdused.

Results on the BBMA model

For each calibration run we observed one of two outcomes: either hillclimbing converges to the same solu-tion as steepest descent or it finds a worse solution in terms of objective function value. The frequencies ofthese outcomes are summarized in table 4. For each purpose it was observed that for each of the 16 differentinitial solutions steepest descent converged to the same solution. Figure 15 and 16 respectively compare theconvergence in objective function value of the methods in case hillclimbing converges to the same solution assteepest descent and in case it does not.

In our discussion of the reliability in finding a quality solution criterium in section 4.4 we mentioned thatplots of observed versus modelled trip length distributions are often more meaningful to a consultant than anobjective function value. Figure 17 and figure 18 compare the modelled trip length distributions of the hill-climbing and steepest descent solution with the observed trip length distribution for respectively a run for theEducation purpose and Work purpose in both of which hillclimbing ended in a suboptimal solution.

56

Purp

ose

#of

tim

eshillc

lim

bin

g#

ofti

mes

hillc

lim

bin

g#

of

diff

eren

thillc

lim

bin

gob

jst

eep

est

des

cent

finds

aw

orse

solu

tion

finds

the

sam

eso

luti

on

start

ing

solu

tions

valu

eav

g.:

ob

jva

lue

avg.

Wor

k:

115

16

0.1

24

0.0

82

Busi

nes

s:0

16

16

0.1

15

0.1

15

Educa

tion

:3

13

16

0.1

18

0.0

84

Sto

res:

115

16

0.1

36

0.1

21

Oth

er:

016

16

0.0

40

0.0

40

Tot

al:

5(6

.25%

)75

(93.7

5)

%80

(100

%)

Ta

ble

4:

Su

mm

ary

of

resu

lts

com

pari

ng

hil

lcli

mbi

ng

an

dst

eepe

std

esce

nt

for

the

BB

MA

stra

tegi

ctr

affi

cm

odel

.

57

0 10 20 30 40 50 60 70 80 90

# of gravity iterations

0

0.5

1

1.5

2

2.5

3ob

ject

ive

valu

e

gradient descenthillclimbing

# of gravity runs

Figure 15: Typical objective function convergence for a good initial solution for hillclimbing, Purpose: Work

0 10 20 30 40 50 60

# of gravity iterations

0

0.5

1

1.5

2

2.5

obje

ctiv

e va

lue

gradient descenthillclimbing

# of gravity runs

Figure 16: Typical objective function convergence for a bad initial solution for hillclimbing, Purpose: Work

58

(m,u

) =

(Car

, Nco

),

m

u =

-1.

13,

-0.6

7

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

(m,u

) =

(Pu

blic

Tra

nsi

t, N

co),

mu =

-0.

28,

-0.3

0

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

0.25

(m,u

) =

(Bik

e, N

co),

mu =

-0.

58,

-0.6

3

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

(m,u

) =

(Car

, Co

),

m

u =

-0.

51,

-0.6

3

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

0.250.3

0.35

obse

rved

(O

ViN

)hi

llclim

bing

grad

ient

des

cent

(m,u

) =

(Pu

blic

Tra

nsi

t, C

o),

mu =

-0.

30,

-0.3

4

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

0.25

(m,u

) =

(Bik

e, C

o),

mu =

-0.

50,

-0.5

5

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

Fig

ure

17

:B

ad

solu

tio

nfo

rh

illc

lim

bin

gfo

rth

eE

du

cati

on

pu

rpo

seo

fth

eB

BM

Am

odel

59

(m,u

) =

(Car

, Nco

),

m

u =

-1.

29,

-0.4

7

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(m,u

) =

(Pu

blic

Tra

nsi

t, N

co),

mu =

-0.

51,

-0.3

6

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

(m,u

) =

(Bik

e, N

co),

mu =

-0.

69,

-0.5

5

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

(m,u

) =

(Car

, Co

),

m

u =

-0.

38,

-0.3

8

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

0.25

obse

rved

(O

ViN

)hi

llclim

bing

grad

ient

des

cent

(m,u

) =

(Pu

blic

Tra

nsi

t, C

o),

mu =

-0.

09,

-0.2

3

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.050.1

0.150.2

0.250.3

0.35

(m,u

) =

(Bik

e, C

o),

mu =

-0.

42,

-0.4

0

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

Fig

ure

18

:B

ad

solu

tio

nfo

rh

illc

lim

bin

gfo

rth

eW

ork

pu

rpo

seo

fth

eB

BM

Am

odel

60

7.2.2 Calibration of the large scale MRDH strategic traffic model

Robustness under initial starting solution

The first test comparing hillclimbing and the BFGS method for the MRDH model was done for the Otherpurpose. Initially we intentionally focused on this single purpose and did multiple runs from different initialsolutions as the calibration running times for the MRDH model compared to the BBMA are significantly longer(a single BFGS calibration run can take up to twenty hours compared to 25 minutes for BBMA). For the Otherpurpose 4 runs were done with different starting solutions randomly drawn from the [−1, 0]6 cube in which weexpect the minimum to be in, given now our experience with the BBMA model.

The objective function of the calibration is again the unweighted objective function of (15) with no binwidthnormalization or incorporation of confidence intervals.

Solution method setup

For the hillclimbing algorithm, the initial step size maxstep, was finally set to 0.1, as it was observed to besomewhat more favourable than some larger step sizes of 0.25, 0.5 and 1 that were also tried. The initial signs

signmu were all set to 1 and the step size reduction factor γ was set to the golden ratio 1+√5

2 as was the casefor the BBMA model. Again iterating for maxiterations = 40 iterations proved to be sufficient for practicalconvergence.

The parameter values used for the BFGS method i.e. algorithm 5 are as follows: the step size reductionfactor was set to µ = 1

2 and the sufficient decrease parameter was set to c1 = 10−4. For the convergencecriterium the maximum number of iterations was set to maxit = 40 iterations, however convergence each timewas achieved in the relative gradient norm criterion with relative convergence threshold value ε = 10−4 beforethis number of iterations.

Results Other purpose MRDH model

It was observed for each starting solution that BFGS found a solution with smaller objective value than hill-climbing. The converged hillclimbing solutions also were dependent on the initial solution guess, while for BFGSmethod independence was observed i.e. each run converged to the same solution. A visual comparison of thesolution in terms of the dashboard a consultant would judge the solution on for one of these runs, is shown infigure 19.

Change of bin division

The solution obtained from the BFGS method of the Other purpose was still found to be unsatisfying. Themodelled trip length distributions did not approach the observed trip length distributions sufficiently. Thereforeit was decided for the final tests on the MRDH model to use the BBMA bin division, instead of the MRDH bindivision. The BBMA bin division differs in that it is more aggregate for the smallest trip lengths. The fact wechanged the bin division does not necessarily reflect poorly on the BFGS method itself. Although it is possiblethat the solution obtained is a local minimum that is significantly worse than the global solution. This seemsunlikely as the BBMA and MRDH test results suggest the BFGS method converges to the same local minimumindependent from its starting solution, therefore suggesting it converges to the global minimum. The measureof using the BBMA bin division can be thought of as changing the model’s data.

61

(m,u

) =

(Car

, Nco

),

m

u =

-1.

77,

-0.6

4

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.25

(m,u

) =

(Pu

blic

Tra

nsi

t, N

co),

mu =

-0.

75,

-0.8

8

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.250.3

(m,u

) =

(Bik

e, N

co),

mu =

-0.

86,

-0.7

0

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.250.3

0.35

(m,u

) =

(Car

, Co

),

m

u =

-1.

26,

-0.4

6

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.25

obse

rved

(O

ViN

)hi

llclim

bing

bfgs

met

hod

(m,u

) =

(Pu

blic

Tra

nsi

t, C

o),

mu =

-0.

80,

-0.4

6

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.250.3

0.350.4

0.450.5

(m,u

) =

(Bik

e, C

o),

mu =

-0.

51,

-0.4

7

00.

751.

251.

752.

753.

755.

57.

512

.517

.527

.537

.547

.562

.582

.50

0.050.1

0.150.2

0.250.3

0.350.4

0.45

Fig

ure

19

:B

ad

solu

tio

nfo

rh

illc

lim

bin

gfo

rth

eO

ther

pu

rpo

seo

fth

eM

RD

Hm

odel

62

Final test on the MRDH model: calibrating the remaining purposes

In the final test on the MRDH model we calibrated also the Work, Shops and Business purposes. from thestarting solution βmu = −0.5 for each of the 6 mode-user class combinations (m,u) using hillclimbing and theBFGS method. As the running time for calibrating the MRDH model is so much higher than for the BBMAmodel, it was unpractical to test multiple initial solution for each purpose. This initial guess of βmu = −0.5was picked for each purpose as it is the midpoint of the [−1, 0]6 cube in which we expect the minimum to be in.

We used a weighted objective function with weights based on the confidence interval widths as defined in3.3.2: The original trip distribution numbers dmuk and dmuk are first divided by their binwidth and subse-quently normalized so that the resulting distributions sum to 1 for each mode-user class combinations (m,u).

Denote by d relmuk and d relmuk these modelled and observed resulting distributions respectively. Then with therelative error rk calculated by (13) the objective function is given by:

F (β) =∑m,u,k

(d relmuk − d relmuk

rk · d relmuk

)2

For the construction of the confidence intervals we used survey count data from OViN (dutch: ’OnderzoekVerplaatsing in Nederland’)[1]. Runs for the Education purpose were not done in this final test, as OViN countswere missing for it. For both hillclimbing and the BFGS method the same parameter values and general solutionmethod setup were used as in the earlier test runs on the Other purpose of MRDH.

Results of the final test on the MRDH model

It was observed that hillclimbing converged to the same solution as the BFGS method for all purposes ex-cept again for the Other purpose for which hillclimbing converged to a suboptimal solution. Figure 20 comparesthe solutions of the methods for this test run. Table 5 shows the running time of BFGS for each of the purposescalibrated. Also the number of line search steps5, the total number of extra gravity runs required i.e. the totalnumber of times the step size is reduced in the line search steps and the total number of gravity runs done.Assuming the uncalibrated Education purpose would take 10 hours which is the mean of the running timesof the calibrated purposes. The total time of calibrating the MRDH strategic traffic model using the BFGSmethod would take approximately 60 hours. Therefore the requirement of calibrating the MRDH model in 3days discussed in the performance criteria section 4.4 (criterium (iv): Running time) is well met. Hillclimbinghowever has a much shorter running time, approximately 1

6 of BFGS method’s running time. The test resultsoverall suggest that the BFGS method although much slower in convergence is more reliable in terms of findinga quality solution. We also checked for this test and the BBMA tests whether the bad solutions of hillclimbingare not just suboptimal local minima by calculating the gradient norm. These norms were observed to berelatively large excluding this possibility.

5By a line search step we mean the computation of the gradient followed by computing a step size that satisfies the sufficientdecrease condition.

63

Purp

ose

tim

e(h

ours

)#

ofline-

#of

extr

agra

vit

yto

tal

#of

sear

chit

erat

ions

runs

inline

searc

hgra

vit

yru

ns

Wor

k:

6.8

1510

115

Busi

nes

s:16

202

142

Sto

res:

20.9

312

219

Oth

er:

7.3

171

120

Ta

ble

5:

Ru

nn

ing

tim

eo

fB

FG

Sm

eth

oda

nd

nu

mbe

ro

fli

ne

sea

rch

step

sa

nd

gra

vity

run

sn

eed

edfo

rth

efo

ur

pu

rpo

ses

of

the

MR

DH

mod

elth

at

wer

eca

libr

ate

d.

64

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(m,u

) =

(Car

, Nco

),

m

u =

-2.

50,

-2.7

6

2.5

5.5

12.5

17.5

27.5

37.5

52.5

62.5

82.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7(m

,u)

= (P

ub

lic T

ran

sit,

Nco

),

m

u =

-0.

76,

-2.8

7

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.2

0.4

0.6

0.81

(m,u

) =

(Bik

e, N

co),

mu =

-1.

06,

-1.3

2

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(m,u

) =

(Car

, Co

),

m

u =

-1.

42,

-1.9

5

obse

rved

(O

ViN

)95

%co

nfid

ence

bou

nds

hillc

limbi

ngbf

gs m

etho

d

2.5

5.5

12.5

17.5

27.5

37.5

52.5

62.5

82.5

0

0.2

0.4

0.6

0.81

1.2(m

,u)

= (P

ub

lic T

ran

sit,

Co

),

m

u =

-0.

40,

-0.5

5

02.

55.

512

.517

.527

.537

.552

.562

.582

.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.91

(m,u

) =

(Bik

e, C

o),

mu =

-0.

69,

-1.3

8

Fig

ure

20

:C

om

pari

ng

the

solu

tio

ns

for

the

Oth

erp

urp

ose

of

the

MR

DH

mod

el

65

8 Conclusions and recommendations

The first recommendation following from this thesis is to use a triply constrained gravity model in the cal-ibration. Advantages are that the α parameters become a function of the β parameters choice simplifyingthe calibration process and reducing its running time significantly by about 80% independent of the actual βcalibration algorithm used. Also the modal split constraints are met exactly and the original biproportionalgravity model structure is still retained.

The results from the calibration test runs done on the BBMA and MRDH strategic traffic model show that theBFGS method is more reliable in finding a quality solution than the hillclimbing algorithm that is currentlyused. The method is approximately 6 times slower approximately canceling the running time reduction from thenew triproportional fitting approach. Though slow it would already fit the time budget of 3 days for calibratingthe entire largest strategic traffic model of MRDH. On the other hand only in about 5% of the time hillclimbingfinds a worse solution than the BFGS method. In the other cases hillclimbing works just fine and would beprefered. It would therefore be interesting in further research to investigate whether the methods and theirbenefits, speed and reliability, can be combined. Hillclimbing could be used initially and hopefully converge tothe optimal solution quickly. Whether the found solution is optimal can be checked by calculating the gradientin the converged solution. Then a switch can be made to the BFGS method in case the solution is not locallyoptimal.

Another area of study for further research could be with the analytical gradient method. It was observedto be significantly faster than using finite differences up to model sizes of 3300 zones. It might be possibleto speed up the analytical method even further by considering alternative methods to solve the linear system.Right now it is however not ready to be used as an alternative to finite differences as occasionally the matrix ofthe linear system becomes singular.

66

References

[1] https : / / www . cbs . nl / nl - nl / onze - diensten / methoden / onderzoeksomschrijvingen / korte -

onderzoeksbeschrijvingen/onderzoek-verplaatsingen-in-nederland--ovin--.

[2] Mounir Mahmoud Moghazy Abdel-Aal. “Calibrating a trip distribution gravity model stratified by thetrip purposes for the city of Alexandria”. In: Alexandria Engineering Journal 53.3 (2014), pp. 677–689.

[3] J. Brethouwer. The multi-constrained gravity model, And how to solve it using multi-proportional fitting.Report of internship at DAT.Mobility. 2018.

[4] Angelo Corana et al. “Minimizing multimodal functions of continuous variables with the simulated an-nealing algorithm”. In: ACM Transactions on Mathematical Software (TOMS) 13.3 (1987), pp. 262–280.

[5] J. de D. Ortuzar and L.G. Willumsen. Modelling transport. John Wiley & Sons, 1990.

[6] P.J.C Dickinson. Personal communication. April, 2018.

[7] R. Fransen. Automated parameter calibration for a gravity model. Report of internship at DAT.Mobility.2015.

[8] Ubaldo M Garcıa-Palomares, Francisco J Gonzalez-Castano, and Juan C Burguillo-Rial. “A combinedglobal & local search (CGLS) approach to global optimization”. In: Journal of Global Optimization 34.3(2006), pp. 409–426.

[9] Dongmin Kim, Suvrit Sra, and Inderjit S Dhillon. “Tackling box-constrained optimization via a newprojected quasi-newton approach”. In: SIAM Journal on Scientific Computing 32.6 (2010), pp. 3548–3563.

[10] R. R. A. Martins. Sensitivity Analysis AA222 - Multidisciplinary Design Optimization. aero-comlab.stanford.edu/jmartins/aa222/aa222sa.pdf. Accessed: 07-09-2018.

[11] Friedrich Pukelsheim and Bruno Simeone. On the iterative proportional fitting procedure: Structure ofaccumulation points and L1-error analysis. https://opus.bibliothek.uni- augsburg.de/opus4/

frontdoor/deliver/index/docId/1229/file/mpreprint_09_005.pdf. Accessed: 07-09-2018. 2009.

[12] Uriel G Rothblum and Hans Schneider. “Scalings of matrices which have prespecified row sums and columnsums via optimization”. In: Linear Algebra and its Applications 114 (1989), pp. 737–764.

[13] James C Spall. “Multivariate stochastic approximation using a simultaneous perturbation gradient ap-proximation”. In: IEEE transactions on automatic control 37.3 (1992), pp. 332–341.

[14] Sean Wallis. “Binomial confidence intervals and contingency tests: mathematical fundamentals and theevaluation of alternative methods”. In: Journal of Quantitative Linguistics 20.3 (2013), pp. 178–208.

[15] Alan Geoffrey Wilson. “The use of entropy maximising models, in the theory of trip distribution, modesplit and route split”. In: Journal of Transport Economics and Policy (1969), pp. 108–126.

[16] Stephen Wright and Jorge Nocedal. Numerical optimization. Vol. 35. 67-68. Springer, 1999.

67

https://www.cbs.nl/nl-nl/onze-diensten/methoden/onderzoeksomschrijvingen/korte-onderzoeksbeschrijvingen/onderzoek-verplaatsingen-in-nederland--ovin--

https://www.cbs.nl/nl-nl/onze-diensten/methoden/onderzoeksomschrijvingen/korte-onderzoeksbeschrijvingen/onderzoek-verplaatsingen-in-nederland--ovin--

aero-comlab.stanford.edu/jmartins/aa222/aa222sa.pdf

aero-comlab.stanford.edu/jmartins/aa222/aa222sa.pdf

https://opus.bibliothek.uni-augsburg.de/opus4/frontdoor/deliver/index/docId/1229/file/mpreprint_09_005.pdf

https://opus.bibliothek.uni-augsburg.de/opus4/frontdoor/deliver/index/docId/1229/file/mpreprint_09_005.pdf

Documents

Gravity model parameter calibration for large scale