16
STATISTIK AUSTRIA Die Infor mationsmanager Alexander Kowarik, Matthias Templ Statistik Austria April, 2014 NEW FEATURES OF VIM - VISUALIZATION AND IMPUTATION OF MISSING VALUES Kowarik, Templ (www.statistik.at) 1 / 16 | April 2014

new features of vim - visualization and imputation of missing values

Embed Size (px)

Citation preview

Page 1: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Alexander Kowarik,Matthias Templ

Statistik AustriaApril, 2014

NEW FEATURES OF VIM -VISUALIZATION ANDIMPUTATION OF MISSINGVALUES

Kowarik, Templ (www.statistik.at) 1 / 16 | April 2014

Page 2: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Implemented Imputation Methods

I Hot-deck Imputation

I k Nearest Neighbour Imputation

I Regression Imputation

I Iterative Robust Model-Based Imputation

Kowarik, Templ (www.statistik.at) 2 / 16 | April 2014

Page 3: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Hot-deck Imputation

old fashioned, but fast ⇒ new implementation based on data.table soon

hotdeck(data , variable = NULL , ord_var = NULL ,

domain_var = NULL , makeNA = NULL , NAcond = NULL ,

impNA = TRUE , donorcond = NULL , imp_var = TRUE ,

imp_suffix = "imp")

I data - data.frame

I variable - variables to be imputed

I ord var - variables for sorting

I domain var - variables for buildin imputation classes

Kowarik, Templ (www.statistik.at) 3 / 16 | April 2014

Page 4: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

kNN

kNN(data , variable = colnames(data), metric = NULL ,

k = 5, dist_var = colnames(data), weights = NULL ,

numFun = median , catFun = maxCat , makeNA = NULL ,

NAcond = NULL , impNA = TRUE , donorcond = NULL ,

mixed = vector(), mixed.constant = NULL ,

trace = FALSE , imp_var = TRUE , imp_suffix = "imp",

addRandom = FALSE)

I dist var - variables to be used for distance computation

I weights - weights for the distance computation

I numFun, catFun - funtion for aggregating numeric or categoricalvariables (sampleCat, maxCat).

I addRandom - add a random value to the distance computation(with very low weight)

Kowarik, Templ (www.statistik.at) 4 / 16 | April 2014

Page 5: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Regression Imputation

Modeldefinition pro Variable

regressionImp(formula , data , family = "AUTO",

robust = FALSE , imp_var = TRUE , imp_suffix = "imp",

mod_cat = FALSE)

I formula,data - model formula and data frame

I robust - robust or classical regression

I family - default value “Auto” tries t guess the right family

I mod cat - TRUE=category with highest probability, FALSE=sample

Kowarik, Templ (www.statistik.at) 5 / 16 | April 2014

Page 6: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Iterative Robust Model-Based Imputation

1. Initialization (Default: kNN)

2. Cycle through all variables with missing values and impute by amodel with all other variables as predictors

3. Repeat 2. until convergence is reached

Kowarik, Templ (www.statistik.at) 6 / 16 | April 2014

Page 7: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Iterative Robust Model-Based Imputation

I Robust Models

I variable type ⇒ automatic model selection

I semi-continuous variables are modeled in two steps

I stepwise variable selection possible

I multiple imputation possible

I ⇒ fully automatic approach

I future improvement ⇒ define model formula per variable

Kowarik, Templ (www.statistik.at) 7 / 16 | April 2014

Page 8: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Iterative Robust Model-Based Imputation

irmi(x, eps = 5, maxit = 100, mixed = NULL ,

mixed.constant = NULL , count = NULL , step = FALSE ,

robust = FALSE , takeAll = TRUE , noise = TRUE ,

noise.factor = 1, force = FALSE , robMethod = "MM",

force.mixed = TRUE , mi = 1, addMixedFactors = FALSE ,

trace = FALSE , init.method = "kNN")

I robust - robust or classical regression

I step - stepAIC in each iteration

I mixed - column indiced of semi-continuous variables

I count - column indiced of count variables(Poisson)

Kowarik, Templ (www.statistik.at) 8 / 16 | April 2014

Page 9: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Visualization of Missing Values

I Simple plots to visualize the distribution of missingnessP

ropo

rtio

n of

mis

sing

s

0.00

0.02

0.04

0.06

0.08

0.10

py01

0n

py03

5n

py05

0n

py07

0n

py08

0n

py09

0n

py10

0n

py11

0n

py12

0n

py13

0n

py14

0n

Com

bina

tions

py01

0n

py03

5n

py05

0n

py07

0n

py08

0n

py09

0n

py10

0n

py11

0n

py12

0n

py13

0n

py14

0n

3827425119534832231311119865443222222211111111111

Kowarik, Templ (www.statistik.at) 9 / 16 | April 2014

Page 10: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

Visualization of Missing Values II

I Multivariate plotsag

e

P03

3000

py01

0n

py03

5n

Kowarik, Templ (www.statistik.at) 10 / 16 | April 2014

Page 11: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

I Graphical user interface for imputation and visualization

I compatibel with survey-objects

I R-Script (reproducibility)

Kowarik, Templ (www.statistik.at) 11 / 16 | April 2014

Page 12: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

Kowarik, Templ (www.statistik.at) 12 / 16 | April 2014

Page 13: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

Kowarik, Templ (www.statistik.at) 13 / 16 | April 2014

Page 14: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

Kowarik, Templ (www.statistik.at) 14 / 16 | April 2014

Page 15: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

Kowarik, Templ (www.statistik.at) 15 / 16 | April 2014

Page 16: new features of vim - visualization and imputation of missing values

S T A T I S T I K A U S T R I A

D i e I n f o r m a t i o n s m a n a g e r

VIMGUI

Kowarik, Templ (www.statistik.at) 16 / 16 | April 2014