Upload
dinhnguyet
View
220
Download
2
Embed Size (px)
Citation preview
1
Lecture 3
Mixed Logit Cinzia Cirillo
2
Overview
1. Choice Probabilities 2. Random Coefficients 3. Error Components 4. Substitution Patterns 5. Approximation to Any RUM 6. Panel Data 7. Case Study
3
1. Choice Probabilities
• Mixed logit (ML) is a highly flexible model that can estimate any RUM.
• ML overcomes the three major limitations of standard logit: – Random taste variation. – Unrestricted substitution patterns. – Correlation in unobserved factors over time.
• Unlike probit, ML is not restricted to a normal distribution. Like probit, ML have been known for many years but just recently increased in use with the advance of simulation.
4
1. Choice Probabilities (cont.)
• ML models’ choice probabilities are expressed as follows:
𝑃𝑛𝑛 = �𝑒𝛽𝛽𝑛𝑛
∑ 𝑒𝛽𝛽𝑛𝑛𝑗∗ 𝑓 𝛽 𝑑𝛽
Where: – f(β) is a density function. – β is a set of coefficients. – Xni is the set of values of size N.
5
2. Random Coefficients
• ML probability can be derived from utility-maximization behavior in many ways, random coefficients being the most popular one.
• Random coefficients represent the variation over people within a certain group (e.g., income level, drivers, bikers, elderly) in the value they put on a certain utility (i.e., cost).
𝑈𝑛𝑗 = 𝛽𝑛𝑥𝑛𝑗 + ε𝑛𝑗 – Where: Unj is the utility of person n for alternative j. εnj is a random term that is iid extreme value.
• The decision maker knows the value of his own βn and εnj for all j and chooses alternative i if and only if Uni > Unj V j ≠ i
6
3. Error Components • ML can be used without random-coefficients interpretation by
representing error components that create correlations among the utilities for different alternatives:
𝑈𝑛𝑗 = α𝑛𝑥𝑛𝑗 + µ𝑛𝑧𝑛𝑗 + ε𝑛𝑗 – Where: xnj and znj are vectors of observed variables of alternative j. α is a vector of fixed coefficients. μ is a vector of random terms with zero mean. εnj is a random term that is iid extreme value.
7
3. Error Components (cont.) • The terms in znj are random components that, along with εnj,
define the stochastic portion of utility. • The random part of the utility is ηnj = μnznj + εnj, which can be
correlated over alternatives depending on the specification of znj.
• For standard logit, znj is identically zero. Where as, if correlation exists (i.e. non-zero components), then: Cov (ηni, ηnj) = zniWznj, where W is the covariance of μn.
• Random coefficients and error components are formally equivalent. However, each one affects the researcher’s ML specification differently.
8
4. Substitution Patterns • ML does not exhibit independence from IIA. • A ten percent reduction in one alternative does not imply a ten
percent reduction in each other alternatives. • The percentage change in the probability of one alternative
given a change in the mth attribute of another alternative is:
𝐸𝑛𝑛𝑋𝑛𝑛𝑚 = −1𝑃𝑛𝑛
�𝛽𝑚𝐿𝑛𝑛 𝛽 𝐿𝑛𝑗 𝛽 𝑓 𝛽 𝑑𝛽
– Where: βm is the mth element of β.
9
5. Approximation to any RUM
• Any RUM can be approximated to any degree of accuracy through ML.
• Suppose the true model is Unj = αn*znj , where znj are variables related to alternative j and α follows any distribution f(α).
• The conditional probability is: 𝑞𝑛𝑛 𝛼 = 𝐼 𝛼𝑛′ 𝑧𝑛𝑛 > 𝛼𝑛′ 𝑧𝑛𝑗 ∀ 𝑗 ≠ 𝑖
– Where: I(.) is the 1-0 indicator of whether the event in parentheses occurs. • The unconditional probability is:
𝑄𝑛𝑛 = �𝐼 𝛼𝑛′ 𝑧𝑛𝑛 > 𝛼𝑛′ 𝑧𝑛𝑗 ∀ 𝑗 ≠ 𝑖 𝑓 𝛼 𝑑𝛼
10
5. Approximation to any RUM (cont.)
• We can approximate the previous probabilities with a ML by scaling the utility by λ, so that
𝑈𝑛𝑗∗ =𝛼λ𝑧𝑛𝑗
• Then we add an iid extreme value εnj to obtain a ML. This does not change the model since it changes the utility of each alternative.
• The ML probability then is
𝑃𝑛𝑛 = �𝑒𝛼λ
′𝑧𝑛𝑛
∑ 𝑒𝛼λ
′𝑧𝑛𝑛
𝑗
𝑓 𝛼 𝑑𝛼
• As λ approaches zero, the ML probability Pni approaches the true probability Qni.
11
6. Panel Data
• When using panel data, the integrand involves a product of logit formulas, one for each time period.
𝑃𝑛i = � L𝑛i 𝛽 𝑓 𝛽 𝑑𝛽
– Where:
L𝑛i 𝛽 = �𝑒𝛽′𝑛𝛽𝑛𝑛𝑖
∑ 𝑒𝛽′𝑛𝛽𝑛𝑛𝑖𝑗
𝑇
𝑡=1
• Lagged dependent variables can be added to ML without adjusting the probability formula or simulation method.
12
6. Panel Data (cont.)
• So far, we have assumed that the βn is constant over choice situations for a given decision maker, which is only true if their tastes does not vary over a time period.
• The coefficients can then be specified to vary over time. One way to do this is to serially correlate each person’s tastes over choice situations:
𝑈𝑛𝑗𝑡 = 𝛽𝑛𝑡𝑥𝑛𝑗𝑡 + ε𝑛𝑗𝑡 𝛽𝑛𝑡 = 𝑏 + β𝑛𝑡∗
β𝑛𝑡∗ = 𝜌β𝑛𝑡−1∗ + 𝜇𝑛𝑡 – Where: b is fixed and μnt is iid over n and t.
13
7. Case Study 1
• Consider a mixed logit of anglers’ choices of fishing sites. • Utility is Unjt = βnxnjt + εnjt , with coefficients βn varying over
anglers but not over trips for each angler. • The sample consists of 962 river trips taken in Montana by 258
anglers during the period of July 1992 through August 1993. A total of 59 possible river sites were defined.
• Simulation was performed using one thousand random draws for each sampled angler.
14
7. Case Study 1 (cont.) • The following variables enter as elements of x for each site:
1. Fish stock, measured in units of 100 fish per 1000 feet of river. 2. A esthetics rating, measured on a scale of 0 to 3, with 3 being the highest. 3. Trip cost: cost of traveling from the angler’s home to the site, including the
variable cost of driving (gas, maintenance, tires, oil) and the value of time spent driving (with time valued at one-third the angler’s wage.)
4. Indicator that the Angler’s Guide to Montana lists the site as a major fishing site.
5. Number of campgrounds per U.S. Geological Survey (USGS) block in the site. 6. Number of state recreation access areas per USGS block in the site. 7. Number of restricted species at the site. 8. Log of the size of the site, in USGS blocks.
15
16 TM Leuven, April 13 2005 16
Application 1: Mode choice model
• The data set (called Mobidrive) used in this research work was collected in 1999 in two cities of Germany: Karlsruhe and Halle.
• The Mobidrive study, whose main objective was to observe the variability and rhythms of daily life, involved 160 households and 360 individuals.
• Each individual was observed during six continuous weeks.
17 TM Leuven, April 13 2005 17
Activity patterns
• Worker’s daily activity chain » Morning commute » Midday tour » Evening commute » After tour
• Non worker’s daily activity chain » Before tour » Main tour (main activity) » After tour
18 TM Leuven, April 13 2005 18
Example of patterns and tours for two persons
Example of a worker Example of non-worker Trip chain Dep./Arr. Time Trip chain Dep./Arr. time 1 2 3 4
Home - Work Work-Shopping Shopping-Home
8:00am 8:20am 5:30pm 5:45pm 6:40pm 6:55pm
Home - Shopping Shopping - Home Home-Leisure Leisure-Home
11:10am 11:25am 12:45am 13:05pm 7:40pm 8:05pm 10:00pm 10:15pm
19 TM Leuven, April 13 2005 19
Mode choice model: Mobidrive data
Type of tour
Main mode Walking
Cycling
Vehicle driver
Vehicle Passenger
Public transport
Total All modes
% shares
Non-workers Morning tour Principal tour Evening tour Worker Morning tour Midday tour Work tour Evening tour All tour types (Share in %)
286 250
51
9 20
213 112 941
16.3%
328 203
31
10 53
474 144
1243
21.4%
638 541
89
31 33
561 181
2074
35.8%
182 264
25
1 3
76 170 721
12.4%
138 207
25
4 24
379 39
816
14.1%
1572 1465
221
55 133
1703 646
5795
27.1% 25.3%
3.8%
0.9% 2.3%
29.4% 11.2%
100.0%
20 TM Leuven, April 13 2005 20
Mode choice model: variables Level Household Individual Pattern LOS
Variables House hold location Age Marital status Professional Status Use of car Use of Public Transport Time budget [min/100] Sum of travel time [min] Tour Duration [min Number of stops Time [min] Cost [DM]
Categories Urban Suburban Age 18-25 Age 26-35 Age 51-65 Married with children Full Time worker Female and employed part-time Main car user Total annual mileage by car Number of season tickets 24 hours – time spent on previous activities (home stay included) and previous travel Sum of time spent traveling Sum of tour travel time and activity duration. Number of secondary activities observed within each tour
Including any parking fees
21 21
Goodness of fit
n. of obs. L (0) L (C) L (β) K ρ2 adjusted
Multinomial Logit (MNL)
5795 - 8179.88 - 7503.82 - 6465.11
21 0.2070
MNL with interactions with socio-economic parameters
5795 - 8179.88 - 7503.82 - 6559.23
21 0.1955
Mixed Logit
5795 - 8179.88 - 7503.82 - 6446.88
26 0.2086
Mixed logit on panel data (day)
5795 - 8179.88 - 7503.82 - 6039.21
26 0.2585
22 22
VOT: Value Of Time study
Confidence interval (Armstrong et al., 2001)
( ) ( ) ( ) ( ) 2 2
c
2 2 c
2 2 t
2 c t
2
t
c
c
t 2 2
c
2 c t
t
c
c
t I , S
t t
t t t t t t t t t
t t t t t
t t
V −
− − − − ρ
θ θ
± − ρ −
θ θ
=
Variable Upper limit Lower limit VTTS point estimate correlation T-stat Number of observations
All purposes
14.47 8.60
11.06 0.093 - 4.76 5795
Work or education
13.57
4.50 8.23
0.061 - 4.65 1866
Shopping
757.09 13.02 26.99 0.077 - 6.03 1252
Leisure
80.61 7.07
15.30 0.09
- 3.24 1384
Other
50.74 9.09
17.94 0.057 - 6.78 1293
ρ
23 TM Leuven, April 13 2005 23
VOT by socio-economic characteristics Parameter ASC Car passenger ASC Public Transport ASC Walk ASC Bike Annual mileage by car Time Budget Tour Duration Sum of Travel Time Time Cost Interaction variables Time * urban household location Time * Age18-25 Time * Age26-50 Time * Age51-65 Time * Full-time worker Time * Female part-time Time * Married with children Time * Number of stops Time * Season Ticket Time * Main Car user
Alternative CP PT W B CD CD, CP PT B All All All All All All All All All All All All
β -1.379 -1.495 0.239
-0.491 0.046
-0.043 0.003
-0.005 -0.023 -0.113
0.018 0.021
-0.001 0.003
-0.001 -0.032 -0.027 0.015 0.002
-0.006
t-stat. -22.6 -11.6
1.7 -3.8 12.5 -2.9 17.2 -2.8 -9.4
-10.5
8.0 5.7
-0.1 1.3
-0.2 -8.4 -9.2 6.3 0.9
-2.5
VOT
12.04
2.35 0.82
12.48 10.35 12.31 29.19 26.52
4.19 10.79 14.98
24 24
VOT per tour type VOT distribution for non-workers per tour type VOT distribution for workers per tour type
95th percentile VTTS [DM] 75th percentile VTTS [DM] Average VTTS [DM] Share of % negative VTTS
Morning pattern
32.2 18.8 13.5 6%
Principal pattern
55.25 28.4 15.4 25%
Evening pattern
14.6 8.2 5.4
14%
95th percentile VTTS [DM] 75th percentile VTTS [DM] Average VTTS [DM] Share of % negative VTTS
Morning pattern
n.a. n.a. n.a. n.a.
Commute pattern
15.3 9.3 7.6 0%
Evening pattern
64.7 35.0 21.4 19%
25 25
Travel time
0
0.2
0.4
0.6
0.8
1
-0.200 -0.150 -0.100 -0.050 0.000 0.050
Parameter value: travel time
Cumu
lative
prob
abilit
y
26 26
Travel cost
0
0.2
0.4
0.6
0.8
1
-0.200 -0.150 -0.100 -0.050 0.000 0.050 0.100 0.150 0.200
Parameter value: travel cost
Cumu
lative
prob
abilit
y
27 27
VOT
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40
Value of time (GM)
Cumu
lative
prob
abilit
y
28 28
Travel time: Before principal activity NW
0
0.2
0.4
0.6
0.8
1
-0.150 -0.100 -0.050 0.000 0.050 0.100 0.150
Parameter value: travel time Before Principal Activity - non-workers
Cumu
lative
prob
abilit
y
29 29
Travel time: Principal pattern NW
0
0.2
0.4
0.6
0.8
1
-0.150 -0.100 -0.050 0.000 0.050 0.100 0.150
Parameter value: travel time Principal pattern - non-workers
Cumu
lative
prob
abilit
y
30 30
Travel Time: Evening pattern NW
0
0.2
0.4
0.6
0.8
1
-0.150 -0.100 -0.050 0.000 0.050 0.100 0.150
Parameter value: travel time Evening pattern - non-workers
Cumu
lative
prob
abilit
y
31 31
Travel time: Commute pattern W
0
0.2
0.4
0.6
0.8
1
-0.150 -0.100 -0.050 0.000 0.050 0.100 0.150
Parameter value: travel time Commute pattern - workers
Cumu
lative
prob
abilit
y
32
Heterogeneity across population ca be estimated with flexible forms of discrete choice models.
In numerous practical cases, parametric distributions are a priori specified and the parameters for these distributions are estimated.
This approach can however lead to many practical problems:
1. It is difficult to assess which is the more appropriate analytical distribution.
2. Unbounded distributions often produce values ranges with difficult behavioral interpretation.
3. Little is known about the tails and their effects on the mean of the estimates.
Flexible discrete choice models: accommodating heterogeneity across population.
33
Alternative non-parametric approaches in the literature
Less restrictive non-parametric or semi-parametric approaches to the problem.
Bounded distributions: often obtained as simple transformations of normals (Train and Sonnier).
Mass point approach: Dong and Koppelman assume that distributions are represented by a finite number of points and use the Bayesian method to recover their mass.
Non-parametric: Fosgerau employs various non-parametric techniques to investigate the distribution
of the travel-time savings from a stated choice experiment. This method does not account for repeated observations and applies only to binomial choices.
34
… more Global Sign conditions: Hensher resolves the problem of behaviorally incoherent sign changes by
imposing a global sign condition on the marginal disutility. He adopts a globally constrained Rayleigh distribution for total travel time.
Discrete Mixture of GEV: Hess, Bierlaire and Polak propose discrete mixture of GEV models over a finite set of distinctive support points. The major advantage of this approach is the lack of need for simulations.
Testing distributions: Recently, Fosgerau and Bierlaire have proposed a semi nonparametric (SNP) specification, based on Legendre polynomials, to test if a random parameter of a discrete choice model follows a given distribution
Willingness to pay space: Train and Weeks place distributional assumptions on the willingness to pay and derive the distribution of the coefficients.
35
CONCLUSIONS!!!
“It is not possible to identify the distribution to use in all situations; the best distribution-fit is highly situation dependent.” Train and Weeks, 2005
36
Mixed Logit model estimation Discrete set of alternatives available for individual Each alternative has some utility: Utility maximization principle: choose Probability choice of Aj by individual i: Heterogeneity in parameters inside the population: where random vector, vector of parameters
( )iAi :( )iAi ∈ ( ) ijijij VU εβ +=
( )iAAUUifA ninijj ∈∀≥
( ) ( ) ( ) ( )[ ]iAAVVPLP nininijijijij ∈∀+≥+== εβεββ( )θγββ ,=
( ) ( )[ ] ( ) ( )∫== γγθγθγθ γ dfLLEP ijijij ,,γ θ
37
Drawing from random variable How to draw from univariate random (continuous) variable X? Fact: the cumulative distribution (CDF) FX (X) ~ U[0; 1]. This suggests to draw from an U[0; 1] and apply the inverse operation:
X = F -1X (U[0; 1]) If FX is unknown, construct F -1
X as a vector in some functional space (a linear combination of basis elements). Constraint: FX (and ) F -1X must be monotonically increasing.
38
Multivariate distributions
Problem: the inverse CDF method cannot be applied directly to multivariate distributions. (In theory) decompose the CDF of the multivariate random variable as The inverse CDF method may then be applied sequentially. Independent distributions: => all random variables are considered independently; => we can then focus on unidimensional distributions.
( ) ( )1211211212112121 ...,|...|||)(),...,,(... ...−−
=dddddd XXXXXXXXXXXXxXxxxXXX PPPP
),...,,( 21 dXXX
39
Univariate distributions
Assumption: each r.v. Xi has a bounded support.
One possibility to approximate
X : B-splines of degree p. Knots vector:
n basis functions, with B-spline curve:
where are the control points.
1−xF
{ }bbuuaaU pmp ,...,,,...,,,..., 11 −−+=
ipN 1−−= pUn
( ) ( )∑=
=n
ipii uNPuC
0,
iP
40
can be chosen such that is monotonically increasing if
Example:
ipN
basis functions resulting splines
( )uC nPPP ....10 ≤≤
{ } 3,1,1,1,1,32,31,0,0,0,0 == pU
{ }15,3,5.0,5.0,3,15 −−−=P
41
Log-likelihood maximization How to maximize the log-likelihood, under the monotonicity constraints? Use of trust-region methods. model (difficult) objective functions. . . . . . inside a trust region sufficient descent on the model contract/expand region
good theory + good practice
• The constraints can be easily dealt with projections.
42
Congestion pricing There is a consensus among economists that congestion pricing represents the single most viable
and sustainable approach to reducing traffic congestion. Congestion pricing works by shifting purely discretionary rush hour highway travel to other
transportation modes or to off-peak periods, taking advantage of the fact that the majority of rush hour drivers on a typical urban highway are not commuters.
Although concerns are often expressed, surveys show that drivers support it because it offers them a reliable trip time. Transit and ridesharing advocates appreciate the ability of congestion pricing to generate both funding and incentives to make transit and ridesharing more attractive.
43
Pricing strategies There are four main types of pricing strategies, each of which is discussed in more detail later in this
section: Variably priced lanes, involving variable tolls on separated lanes within a highway, such as
Express Toll Lanes or HOT Lanes, i.e. High Occupancy Toll lanes Variable tolls on entire roadways – both on toll roads and bridges, as well as on existing toll-free
facilities during rush hours Cordon charges – either variable or fixed charges to drive within or into a congested area within
a city Area-wide charges – per-mile charges on all roads within an area that may vary by level of
congestion
44
Real life example - US Priced lanes on the State Route 91 in Orange County, California. HOT lanes on San Diego’s I-15. HOV converted to HOT on I-25 in Denver The State of Oregon is currently testing a pricing scheme involving per-mile charges, which it will
consider using as a replacement for fuel taxes in the future. HOT lanes on the Beltway DC ???
45
Real life example - Europe In 2003, a cordon pricing scheme was introduced in central London. A similar scheme functioned in central Stockholm on a trial basis in 2006 from January through July. Area pricing has been introduced in Milan (Italy) in Fall 2007. The first results on congestion
reduction, increase of transit speed and more importantly decrease of pollutants in the atmosphere are very encouraging.
46
Real case study: Estimating WTP on managed lanes
This dataset (called IRIS) is derived from a survey conducted in the region of Brussels (Belgium) in 2002.
The respondents are car users, intercepted during morning peak hours on the ring that gives access to the city from the suburban areas.
They were presented with up to three scenarios, each containing four choice options: 1. car, 2. car with delayed departure time, 3. public transport and 4. car on a High Occupancy Veh. lane (only prospective).
The original specification contained 18 exogenous variables, of which seven randomly distributed.
2602 observations belonging to 871 individuals are available
47
Model Specification In total six model specifications have been estimated: 1. times normally distributed, cost constant (T N); 2. times and cost normally distributed (T-C N); 3. times log-normally distributed, cost constant (T L); 4. times and cost log-normally distributed (T-C L); 5. times B-Spline distributed, cost constant (T BS); 6. times and cost B-Spline distributed (T-C BS).
For the B-Spline coefficients, seven coefficients (P1,P2,...,P7) have been estimated, where P1 and P7
give the bounds of the distribution, and the knot vector is defined on the percentiles 0, 0.25, 0.5, 0.75, and 1.
48
Goodness of Fit
Distr. T N T-C N T L T-C L T BS T-C BS Final Log-Lik
-3.1460
-3.1399
-3.1604
-3.1511
-3.1453
- 3.1339
49
Congested Travel Time
50
Free-Flow Travel Time
51
Cost
52
Willingness to Pay
Distr. Quan. T N T-C N T L T-C L T BS T-C BS
Cong Time
25% 50% 75 % mean
4.75 15.31 25.89 15.32
3.36 4.52 10.49 12.96
6.41 12.64 24.88 20.90
16.10 37.61 87.77 13.17
7.20 11.05 26.12 7.90
2.42 4.36 16.68 4.74
Free Flow Time
25% 50% 75 % mean
2.56 13.72 24.91 13.74
2.10 6.24 11.35 11.61
5.33 10.84 22.02 18.80
14.03 32.65 75.74 12.35
0.89 13.53 22.18 12.64
0.70 3.07 14.35 7.47
53
Willingness to pay (time and cost truncated at zero)
Distr. Quan. T N T-C N T BS T-C BS
Cong Time
25% 50% 75 % mean
10.25 18.54 27.98 19.93
4.65 5.59 12.54 7.94
7.84 15.03 28.47 18.35
4.56 6.95 17.01 7.83
Free Flow Time
25% 50% 75 % mean
9.64 17.97 27.68 19.58
4.40 4.87 12.62 7.92
9.55 17.11 25.67 18.59
3.96 5.60 15.23 7.65
54
Comments on WTP To summarize, we found that non-parametric B-Splines a) on both time and cost coefficients provide the best fit, b) reduce significantly the percentages of the population showing positive values for cost but leave
unchanged the proportion of positive time values, c) give VTTS ranges that do not suffer from fat tail effects. This suggests that the lognormal assumption, even if more coherent with the econometric theory, is not
reasonable here, and the non-parametric approach has the advantage over the normal to bound the distributions.
Computational time does increase with model flexibility; when three parameters are specified as non-parametric optimization time is about 74 minutes on a MacBook Pro, which is 6 times higher than the time required to three normal distributions instead.
55
Extensions Application to Finance The algorithm developed has been applied to a financial problem concerning central bank interventions
and dynamics in the foreign exchange market. The data used for our analysis have been collected from the Japanese Ministry of Finance’s website
(where they are publicly available) for the period April, 1991 to September, 2004. There are four possible outcomes of the central bank decision:
1. no intervention (W), 2. public intervention (Z), 3. secret intervention detected by the market (X), 4. secret intervention not detected by the market
56
W (no intervention decision) deviation short deviation med deviation long misalignment statement interventiont? 1 RVt?1
Absolute level of short-term exchange rate deviation (%) Absolute level of medium-term exchange rate deviation (%) Absolute level of long-term exchange rate deviation (%) Absolute level of exchange rate misalignment (%) 1 if authorities made a statement expressing some discomfort with the exchange rate or confirming/discussing the intervention on the day of the operation 1 if there was an official intervention the day before Exchange rate realized volatility of preceding day, estimated at the end of the day
Z (public process) Leaning previous reported success inconsistence sum statement
1 if the intervention tries to reverse recent exchange rate trend 1 if the last detected intervention was a success 1 if the intervention direction is inconsistent with the reduction of the exchange Number of verbal interventions from the authorities signaling a discomfort with the exchange rate in the 5 days before the intervention
X (detection process) Amount coord success cluster
Amount invested in the daily intervention 1 if intervention is concerted 1 if the intervention moves the exchange rate in the desired direction 1 if there is at least one detected intervention over the last 5 preceding days
57
“Amount” distribution
58 58
Dynamic model of activity-type choice and
scheduling
59 59
Objectives • This paper presents a dynamic model for activity-type traveler choice and
scheduling estimated on a six-week travel diary. • The main focus of the study is the inclusion of past history of activity
involvement and its influence on current activity choice. • The econometric formulation adopted, explicitly accounts for both correlation
across alternatives and state dependency. • This is intended to be a first contribution to the evolution of demand model
into dynamic activity based framework.
60 60
State of art Activity-based models The need for more behaviorally sound framework has led to a new generation of transport model systems called activity-based models:
Bowman and Ben-Akiva, 2000;
Nagel and Rickert, 2001; Bhat and al., 2004; Arentze and Timmermans,
2004; Pendyala and al., 2004.
Day-to-day variability Jones and Clark study the policy
implication of variability analysis and encourage the collection of multi-day travel surveys.
Eric Pas distinguishes long-term patterns from daily behavior and finds that the latter is independent of individual characteristics.
Hanson and Huff, by using a specific measure of repetition and variability, conclude that one-week record of travel does not capture longer-term travel behavior.
Dynamic models Hirsh and al. (1986) estimate a
parametric model of dynamic decision-making process for weekly shopping activity behavior.
Mahmassani and Chang (1986) investigate the dynamics of departure time of urban commuters in a series of simulation experiments.
Dynamic models have been applied more extensively to explain car ownership behavior.
61 European Transport Conference Strasbourg 18-20 September 2006
61
Framework and Data
4952 activity episodes, 3212 daily schedules,
773 weekly schedules, 144 individual schedules.
Working day Non-working day Morning tour H Morning tour H
Morning commute H W Principal out-leg H P
Midday tour w Principal tour H
Evening Commute H W Principal ret-leg H P
Evening tour H Evening tour H
62 62
Mobidrive Number of activity episodes
Activity Scheduling
Shopping Leisure Personal business
Pick up / drop off
No extra activities
Home for
lunch
All tour types
Morning tour
41 14 41 5 101
Morning commute
44 24 65 26 159
Evening commute
207 103 105 40 455
Evening tour
143 376 90 54 663
Work as the only activity
547
547
W o r k i n g
d a y Midday
pattern 159 159
Morning tour
372 126 163 72 733
Principal tour
outbound leg
88
35
90
54
267
Principal tour
404 658 251 64 1377
Principal tour
return leg
73
59
40
14
186
Evening tour
97 93 62 53 305
N o n
w o r k i n g
d a y Total
1469
1488
907
382
4952
63 63
Model formulation: Mixed logit with repeated observations
• We develop this dynamic activity type choice model under the discrete choice analysis theory.
• The focus of the attention is on mixed logit models.
• The formulation adopted here is able to deal with correlation over alternatives in
the stochastic portion of utility, and to allow efficient estimation in presence of repeated choices by the same respondent.
64 64
Mixed logit with error components…
where:
and are vectors of observed variables relating to alternative j, is a vector of unobserved fixed coefficients, is a term of random terms with zero mean, is an unobserved random term i.i.d. extreme value distributed.
njtnjtnnjtnnjt zxU εµβ ++=
njtx njtz
nβ
nµ
njtε
65 65
…and repeated observations
Considering a sequence of alternatives, one for each time period , the probability that the person makes this sequence of choices is the product of logit formulas:
We consider here four groups k: • 1. an individual activity episode, • 2. the activity episodes of a person day, • 3. the activity episodes of a person week, • 4. the activity episodes of an individual.
( ) ∏∑=
+
+
=
T
1t jttnjznttnjxn
ttniznttnixnni
e
e,L µβ
µβµβ
66 66
The choice set structure Activity-type Scheduling Activity-type Scheduling
1. Just work 2.Home for lunch Shopping 3. Morning tour Shopping 19. Morning tour 4. Morning commute 20. Principal tour
outbound leg 5. Evening commute 21. Principal tour 6. Evening tour 22. Principal tour
return leg 23. Evening tour Leisure 7. Morning tour Leisure 24. Morning tour 8. Morning commute 25. Principal tour
outbound leg 9. Evening commute 26. Principal tour 10. Evening tour 27. Principal tour
return leg 28. Evening tour Personal Business
11. Morning tour
Personal Business
29. Morning tour
12. Morning commute 30. Principal tour outbound leg
13. Evening commute 31. Principal tour 14. Evening tour 32. Principal tour
return leg 33. Evening tour Pick up Drop off
15. Morning tour
Pick up Drop off
34. Morning tour
16. Morning commute 35. Principal tour outbound leg
17. Evening commute 36. Principal tour 18. Evening tour 37. Principal tour
return leg
W o r k i n g
d a y
N o n
w o r k i n g
d a y
38. Evening tour
67 67
List of independent variables - I
Level Variables Description / Categories Type
Day Activity duration (min)
It is referred to the activity to be undertaken and it is randomly drawn from the vector of activity (with the same purpose) durations, reported by the same individual, over the entire survey period.
continuous
Day Time budget (min)
It is calculated as 24 hours minus the time spent on previous activities (home stay included) and previous travel.
continuous
Day Available time before work (min)
It is the time available between the shops opening hour (8:00 am) and the arrival time to work.
continuous
Day Available time after work (min)
It is the time available after the departure from work and the shops closing hour (6:00 pm).
continuous
68 68
List of independent variables - II Level Variables Description / Categories Type
Week High week episode
It is a dummy variable, which is 1 if, for the week considered, the number of activity episodes with a specific purpose is greater than two.
dummy
Six-week Last time It is the number of days occurred between the day considered and the day when the same activity was undertaken.
discrete
Six-week Immobile days It is the number of days that the individual spent at home between the first day of the reported period and the day considered.
discrete
Level Of Service
Logsum It is the logsum of the mode choice model. continuous
Individual Socio-Economic Variables
Age Presence of children Professional Status
Age 6-18/ Age 26-35/ Age 51-65 Number of Children under 12 Number of working hours per week Full Time worker Female and employed part-time
dummy dummy continuous dummy dummy
69 69
Model Results I Logit Mixed – err comps Mixed - Day Mixed - Week Mixed - Individuals Variable Alt. Estimates t-stat Estimates t-stat Estimates t-stat Estimates t-stat Estimates t-stat Activity duration S -0.0007 1.04 -0.0007 0.97 -0.0007 1.03 -0.0007 1.03 -0.0007 0.78 L -0.0011 2.94 -0.0012 2.88 -0.0011 3.11 -0.0011 2.81 -0.0012 1.84 PB -0.0011 1.73 -0.0014 1.91 -0.0011 1.78 -0.0011 1.75 -0.0011 1.28 PD -0.0012 1.02 -0.0020 1.39 -0.0013 1.13 -0.0012 0.94 -0.0015 0.70 LH -0.0058 5.18 -0.0049 4.27 -0.0056 5.07 -0.0059 5.50 -0.0063 6.34 Time budget S -0.2830 14.00 -0.3969 11.40 -0.2959 14.40 -0.2857 14.50 -0.2930 16.48 L -0.6703 27.74 -0.8900 17.81 -0.6931 26.00 -0.6828 28.82 -0.7166 36.77 PB -0.3185 16.53 -0.4618 12.43 -0.3283 16.75 -0.3201 16.75 -0.3283 18.67 PD -0.4410 18.97 -0.7109 11.92 -0.4712 17.84 -0.4650 19.83 -0.4854 23.28 Available time before work S -0.0016 2.55 -0.0034 3.11 -0.0020 3.09 -0.0017 2.24 -0.0019 2.17 Available time post work S 0.0040 7.58 0.0073 6.58 0.0044 7.68 0.0041 8.06 0.0043 6.21 High week episode S 1.5222 18.94 2.5051 11.64 1.6169 17.08 1.5326 15.45 1.4756 12.07 L 0.4899 5.20 0.7163 5.40 0.5114 4.99 0.4720 4.67 0.4656 3.07 PB 1.1490 11.41 1.7147 8.36 1.1856 10.35 1.1556 9.14 1.1393 7.40 PD 2.0734 15.29 3.4511 10.13 2.1929 13.96 2.0999 11.48 1.8209 8.72 Last time S 0.5598 7.65 0.9304 6.33 0.6214 7.29 0.5647 7.45 0.5683 6.79 L 4.0883 23.70 5.9394 14.58 4.2972 21.67 4.2187 24.56 4.4828 23.82 PB 0.2022 9.51 0.3738 8.43 0.2220 9.54 0.2023 8.30 0.1978 7.32 PD 0.0020 5.47 0.0039 5.14 0.0025 6.38 0.0026 7.24 0.0030 7.75 Last time leisure W -0.2789 1.90 0.1271 0.72 -0.2345 1.59 -0.2791 1.98 -0.2766 2.05 Last time PB W -0.1242 2.36 -0.1064 1.75 -0.1234 2.24 -0.1252 2.20 -0.1346 2.43
70 70
Model Results II Logit Mixed-logit err comp Mixed logit - Day Mixed logit - Week Mixed logit -Ind Variables Alt. Estimates t-stat Estimates t-stat Estimates t-stat Estimates t-stat Estimates t-stat Immobile days S 0.0056 2.65 0.0085 3.24 0.0058 2.92 0.0058 2.73 0.0060 2.16 L 0.0056 2.80 0.0088 3.99 0.0058 3.26 0.0057 3.21 0.0061 2.77 PB 0.0059 2.94 0.0093 4.06 0.0062 3.36 0.0061 2.81 0.0063 1.98 PD 0.0056 2.75 0.0086 3.69 0.0058 3.15 0.0058 3.16 0.0062 2.94 Logsum All 0.1099 1.90 0.1738 1.92 0.1207 1.90 0.1460 2.38 0.2074 3.04 Age L -0.3866 1.77 -0.5175 1.71 -0.3920 1.35 -0.4364 1.46 -0.6407 0.91 PD 0.3956 2.97 0.6629 2.74 0.4111 2.70 0.4673 2.41 0.5624 1.76 LH -0.4198 2.51 -0.1579 0.91 -0.3929 2.42 -0.4054 2.33 -0.3111 2.16 Number of children under 12 L 0.1805 2.42 0.3369 3.28 0.1929 2.48 0.1814 2.27 0.2095 1.29 PD 0.4654 5.49 0.8233 5.13 0.5052 4.97 0.5302 4.57 0.7093 4.20 N of working hours per week W 0.0273 6.81 0.0336 6.72 0.0281 5.92 0.0272 7.04 0.0262 7.07 Full time worker S -0.7160 3.27 -1.4044 5.10 -0.8085 3.87 -0.7294 3.91 -0.5953 3.38 L -1.0030 4.55 -1.6624 6.62 -1.0914 5.29 -1.0212 5.41 -0.9009 4.29 PB -0.5727 2.61 -1.0877 4.08 -0.6433 3.13 -0.5896 3.27 -0.4504 2.91 PD -0.4398 1.83 -0.9277 2.87 -0.4908 2.05 -0.4840 2.02 -0.3161 0.91 W -1.7173 6.91 -2.5418 8.63 -1.8169 6.81 -1.7321 7.95 -1.6430 9.43 Female and part time S 0.5083 2.42 0.5245 1.82 0.5113 2.33 0.5108 2.26 0.5929 2.74 L 0.3888 1.79 0.4928 1.85 0.3848 1.70 0.3745 1.81 0.3512 1.27 PB 0.5978 2.86 0.7036 2.52 0.5969 2.72 0.5999 2.57 0.6916 3.44 PD 0.6523 2.43 0.7898 1.99 0.6956 2.36 0.6908 1.96 0.9444 1.27 Error components S - - 2.2158 8.48 0.6295 5.78 0.0146 0.02 0.1801 1.41 L - - 0.0118 0.01 0.1337 0.31 0.2444 2.10 0.5274 6.02 PB - - 2.0645 7.93 -0.4249 2.53 -0.0012 0.00 -0.0402 0.08 PD - - 2.5141 7.88 0.6895 3.66 0.5677 5.16 0.5688 2.69 Number of observations 4952 4952 4952 4952 4952 N. of obs with repetitions 4952 4952 3212 773 144 Log likelihood at zero -12268.27 -12268.27 -12268.27 -12268.27 -12268.27 Log likelihood final -9902.19 -9859.29 -9891.00 -9895.56 -9857.51 rho squared adjusted 0.18951979 0.19269057 0.19010586 0.18973417 0.19283566
71 71
Model fit (Final Log-likelihood / Rho-squared adjusted)
Model Degrees of freedom
Logit Mixed – err comps
Mixed - Day
Mixed - Week
Mixed - Individuals
Socio-economic variables
21
-11952.60
0.02434
-11845.09
0.03280
-11850.82
0.03231
-11685.33
0.04580
-11351.77
0.07299 Day behavioral variables
32 -11393.82
0.06899
convergence not
achieved
-11310.30 0.07548
-11162.79 0.08750
-10871.28 0.11126
Week behavioral variables
36 -10528.83
0.13917 -10503.48
0.14092 -10516.96
0.13982 -10525.89
0.13909 -10451.30
0.14517
Long-term behavioral variables
45
-9902.19
0.18952
-9859.29
0.19270
-9891.06
0.19010
-9895.56
0.18973
-9857.51
0.19283
72 72
Application: Week DAY
73 73
Application: Weekend DAY
Day 6 - Saturday
0100200300400500600700800
Sho
ppin
g W
D F
/R
Leis
ure
WD
F/R
PU
DO
WD
F/R
Per
sBus
WD
F/R
No
extr
a w
ork
act
F/R
Lunc
h H
ome
F/R
Tota
l WD
F/R
Sho
ppin
g N
onW
D F
/R
Leis
ure
Non
WD
F/R
PU
DO
Non
WD
F/R
Per
sBus
Non
WD
F/R
Tota
l Non
WD
F/R
ForecastReal
Day 7 - Sunday
0100200
300400500600
700800
Sho
ppin
g W
D F
/R
Leis
ure
WD
F/R
PU
DO
WD
F/R
Per
sBus
WD
F/R
No
extra
wor
k ac
t F/R
Lunc
h H
ome
F/R
Tota
l WD
F/R
Sho
ppin
g N
onW
D F
/R
Leis
ure
Non
WD
F/R
PU
DO
Non
WD
F/R
Per
sBus
Non
WD
F/R
Tota
l Non
WD
F/R
ForecastReal
74 74
Application: Week of survey
75 75
Conclusions
• The paper has implemented a model of activity type and timing choice. • The formulation includes variables, describing the dynamics of the day. • The socio-demographic variables, in other models dominant, loose their
prominence. • The personal preferences, here captured exclusively with the activity type specific
error components, are associated with leisure and picking up and dropping off people.