MELJUN CORTES IBM SPSS Regression

Embed Size (px)

Citation preview

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    1/14

    Linear Regression Analysis

    This contains my personal notes only thus, this is notcomplete. Most of the contents were taken from the trainingmanual of IBM SPSS Modeler. Please refer to the training

    manual for a complete discussion.

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    2/14

    Simple Linear Regression Model

    Consider the scatterplot:

    It shows the relationshipbetween mothers weightand babys birthweight.

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    3/14

    Regression analysis finds a straight line thatsummarizes the relationship of the two

    variables such that the distance of the pointsfrom the line is minimum.

    Mathematically, the line

    can be expressed as anequation:

    bweight = B0

    + B1*mweight + E

    where:

    E ~ N (0, 2)B0 = constantB1 = effect on bweight forevery one pound increase of

    mweight

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    4/14

    Using sample data, a table below canbe generated by SPSS.

    CoefficientsaModel

    Unstandardized CoefficientsStandardized

    Coefficients

    t

    Sig.

    B

    Std. Error

    Beta

    1 (Constant) 2426.719 162.194 14.962 .000mweight 3.977 1.214 .167 3.276 .001

    a. Dependent Variable: bweight

    Mathematically,

    Estimated bweight = 2426.719 + 3.977*mweight

    Note: This equation can be used in predicting bweight if

    information about mweight is available..

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    5/14

    Consider the framework: Age

    weight at last menstrual period(mweight)

    History of hypertension(ht)

    Presence of uterine irritability (ui)

    Babys Birth Weight(bweight)

    Multiple Linear Regression Model

    Mathematically,

    bweight = constant + B1*age + B2*mweight + B3*ht + B4*ui + E

    B1, B2, B3 and B4, called regression coefficients can be estimated if

    sample data are available.

    where:

    E ~ N (0, 2)

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    6/14

    Mathematically,

    bweight = constant + B1*age + B2*mweight + B3*ht + B4*ui + E

    B1, B2, B3 and B4, called regression coefficients canbe estimated if sample data are available.

    where:

    E ~ N (0, 2)

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    7/14

    SPSS generates regression table as follows:

    Mathematically,

    bweight = 2429.007 + 3.656*age + 4.203*mweight -645.545*ht 530.065*ui + E

    Note: This equation can be used to predict bweight if information aboutmothers weight, age, ht and ui are available,

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    8/14

    Uses of Linear Regression Analysis

    Regression analysis can be used (in appliedresearch) to test the relationships between anoutcome variable and set of predictor variables.

    Regression analysis can be used also to predictthe value of the outcome variable given thevalues of the predictor variables.

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    9/14

    Fraud Detection in Insurance Claim(A Regression Analysis Example)

    The following data of patients in a hospital in the U.S are available:

    CLAIM- total insurance claim for a single medical treatment performedin a hospital

    Age age of patient

    LOS length of hospital stay ASG- Severity of illness category. This is based on several healthmeasures and higher scores indicate greater severity of the ilness

    n=293

    Goals:

    1) Build a predictive model for the insurance claim amount;2) Use the model to identify outliers (patient with claim values from

    what the model predicts), which might be instances of errors orfraud made in the claims.

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    10/14

    Dataset: InsClaim.dat

    CLAIM- total insurance claim for a singlemedical treatment performed in a hospital.

    Age age of patient

    LOS length of hospital stay\

    ASG- Severity of illness category. This isbased on several health measures andhigher scores indicate greater severity ofthe ilness.

    n=293

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    11/14

    Diagram in Modeler 15.0

    Generated output:

    Predicted Claims = $3026.754 + $1105.646*length of stay + $417.194*severity code $33.406*age

    In equation format:

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    12/14

    Detecting cases that deviate substantiallyfrom the model (Points Poorly fit by Model).

    **Just compute the residual (DIFF = actual claim predicted claim)

    Generated outputs:

    Examine carefullyif fraud is possible

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    13/14

    Using Linear Models Node to

    Perform Regression

    It has more features than Regression Node,including:

    the bility to create the best subset model,

    several criteria for model selection,

    the option to limit the number of predictors,

    and the use of bagging and boosting.

  • 7/30/2019 MELJUN CORTES IBM SPSS Regression

    14/14

    Additional Features of Linear Models Node

    It automatically prepare the data for modeling, bytransforming the target and predictors in order tomaximize the predictive power of the model.

    This includes: outlier handling, adjusting the measurement level of

    predictors, and merging similar categories.

    It automatically creates dummy variables from

    categorical fields (that have nominal or ordinalmeasurement level).