Conditional inference trees (CTREEs) in dynamic microsimulation

Preview:

DESCRIPTION

The CTREE-algorithm groups together explanatory variables for observations with similar outcomes based on statistical tests. The data mining approach is found to be a useful tool to quantify a discrete response variable conditional on multiple individual characteristics and is generally believed to provide better covariate interactions than traditional parametric discrete choice models, i.e. logit and probit models.

Citation preview

Conditional inference trees in dynamic microsimulation - modelling transition

probabilities in the SMILE model

4th General Conference of the International Microsimulation Association

Canberra, Wednesday 11th to Friday 13th December 2013

Niels Erik Kaaber Rasmussen

DREAM

SMILE

• Microsimulation model• Simulating household and person level

events• Using stocasting drawing (Monto Carlo)

Transition probabilites in SMILE

• Used for demographic, socioeconomic and housing-related events

• Transition probabilities based on rich historical data

Raw transition probabilities

• Transition probabilty = historical frequency• Behavoir depends on many characteristics• Data is too sparse• Too much noise

Moving probabilities

• The probability of moving depends on– Age (109) - Children in family (2)– Familytype and gender(3) - Dwelling type (5)– Region (11) - Dwelling kind (9)– Education (6 * 2) - Dwelling area (8)– Origin (3*4*2*3) - Dwelling est. (12)– Employed (2 *2) - Town size (5)

• Total of 537 billion combinations• 532.655 combinations in data

Alternatives

• Ignore (possible important) background variables

• Use logit or probit models• Detailed econometric analysis• Conditional inference trees (CTREEs)

Conditional inference trees

• Decision tree• Groups observations in a way so that

there’s a:– minimum of variation within a group– maximum variation across groups

• Datamining approach• Based on statistical tests

Example tree, probability of moving

CTREE algorithm1. Test for independence between any of

the explanatory variables and the response

a) Stop if p>0.05

2. Select the input variable with strongest association to the response.

3. Find best binary split point for the selected input variable.

4. Recursively repeat from step 1 until a stop criterion is reached.

Example tree, probability of moving

Moving probabilities

• The probability of moving depends on– Age (109) - Children in family (2)– Familytype and gender(3) - Dwelling type (5)– Region (11) - Dwelling kind (9)– Education (6 * 2) - Dwelling area (8)– Origin (3*4*2*3) - Dwelling est. (12)– Employed (2 *2) - Town size (5)

• Total of 537 billion combinations• 532.655 combinations in data• CTREE contains 2.180 terminal groups

Probability to not start studying

Probability to not start studying

Age dependent employment

Age dependent employment

CTREEs

• Implements a generel framework for conditional inference trees

• Works for continuous, censored, ordered, nominal and multivariate response variables

• Uses permutation tests

Curse of Dimensionality

• The computational complexity of constructing a CTREE multiplies when adding additional explanatory variables

• Number of possible permutations is too big• Draw random permutations using Monte

Carlo sampling

Recommended