98
Do not turn toward pain, but do not run from it either. Pain is your guide. Pain is what guides a person in every serious undertaking. Sufi mystic poet, Jalad-ud-Din Rumi Celebration of the 13th Century Poet Rumi

Improving the Prediction of Suicide and Suicidal Behavior: … · 2008-09-12 · Action to Prevent Suicide, ... Drug dependence/use (Tobacco dependence/use (Access to care ... What

  • Upload
    lythuan

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Do not turn toward pain, but do not run from it either. Pain is your guide.Pain is what guides a person in every serious undertaking.

Sufi mystic poet, Jalad-ud-Din Rumi

Celebration of the 13th Century Poet Rumi

Improving the Prediction of Suicide and Suicidal Behavior:

Data-Mining Approaches*RK Price and NK RiskRK Price and NK Risk

Washington University School of Medicine,Washington University School of Medicine,St. Louis MO, USASt. Louis MO, USA

EE--mail mail [email protected]@rkp.wustl.edu

* Prepared for a lecture at University of Pittsburgh School of Medicine, Western Psychiatric Institute and Clinic, February 2003.

Acknowledgments*Acknowledgments*National Institute on Drug Abuse National Institute on Drug Abuse (K02DA00221, R01DA09281).(K02DA00221, R01DA09281).Longer Life Foundations, Washington Longer Life Foundations, Washington University School of Medicine and the University School of Medicine and the Reinsurance Group of America.Reinsurance Group of America.National Institute of Mental Health National Institute of Mental Health (R01MH60691).(R01MH60691).

* No financial nor any other conflicts exist with any of the above funding agencies nor with the University of Pittsburgh School of Medicine, Western Psychiatric Institute and Clinic.

What am I talking about when I say What am I talking about when I say ““SuicideSuicide””??

Suicide: an act of voluntarily and Suicide: an act of voluntarily and intentionally taking oneintentionally taking one’’s own life.s own life.Committed suicide = Suicide.Committed suicide = Suicide.Suicidal attempt: a physical act to Suicidal attempt: a physical act to accomplish suicide including accomplish suicide including parasuicideparasuicide..Suicidal behavior: suicide attempt, Suicidal behavior: suicide attempt, suicide plan, suicide threat, suicidal suicide plan, suicide threat, suicidal ideation.ideation.Suicidal behavior is included when Suicidal behavior is included when suicide is used suicide is used ““genericallygenerically””..

University of Pittsburgh School of Medicine, WPIC, 2003

What Have People Been Complaining What Have People Been Complaining about in the Epidemiology about in the Epidemiology

of Suicide?of Suicide?Prediction not precise.Prediction not precise.Difficult to predict when.Difficult to predict when.A lot of risk factors known, but A lot of risk factors known, but dondon’’t understand how protective t understand how protective factors work.factors work.Interaction of risk and protective Interaction of risk and protective factors little understood.factors little understood.

University of Pittsburgh School of Medicine, WPIC, 2003

The Surgeon GeneralThe Surgeon General’’s Call to s Call to Action to Prevent Suicide, 1999Action to Prevent Suicide, 1999

The Public Health Approach Applied to The Public Health Approach Applied to Suicide PreventionSuicide Prevention

1. Defining the problem.1. Defining the problem.2. Identifying causes and protective factors.2. Identifying causes and protective factors.

The second step focuses on why. It addresses risk The second step focuses on why. It addresses risk factors such as depression, alcohol and other drug factors such as depression, alcohol and other drug use, bereavement or job loss. This step may be used use, bereavement or job loss. This step may be used to define groups of people at higher risk for suicide. to define groups of people at higher risk for suicide. Many questions remain, however, about the interactive Many questions remain, however, about the interactive matrix of risk and protective factors in suicide and matrix of risk and protective factors in suicide and suicidal behavior and, more importantly, how this suicidal behavior and, more importantly, how this interaction can be modified.interaction can be modified.

What Have We Been Asking What Have We Been Asking Ourselves about Epidemiologic Ourselves about Epidemiologic

Results on Suicide?Results on Suicide?

Do we really have good tools to Do we really have good tools to get the data? (Assessment).get the data? (Assessment).

University of Pittsburgh School of Medicine, WPIC, 2003

If we do have good data:Do we really have good tools to Do we really have good tools to analyze the information in the analyze the information in the data (Analysis).ata (Analysis).

What Have People Been Complaining What Have People Been Complaining about the Statistical Tools about the Statistical Tools

Used in EpidemiologyUsed in Epidemiology

Too much reliance on statistical Too much reliance on statistical significance, leading to loss of significance, leading to loss of information.information.MulticollinearMulticollinear measures often not measures often not well sorted out, producing well sorted out, producing erroneous results.erroneous results.Interaction effects not well Interaction effects not well captured, reducing the level of captured, reducing the level of understanding.understanding.

University of Pittsburgh School of Medicine, WPIC, 2003

(a) Both substantive and statistical significance

(b) Neither substantive nor statistical significance

(c) Statistical, but not substantive significance

Statistical vs. Substantive Significance

0 $*

Source: Achen CH, 1982

0 $*

0 $*

(d) No statistical significance; likely substantive significance

(e) Statistical significance; substantive uncertainty

(f) No statistical significance; substantive uncertainty

Statistical vs. Substantive Significance

Source: Achen CH, 1982

0 $*

0 $*

0 $*

Bi-Modal Distribution Confused as a Large Effect With a Large Standard Deviation

$*$1 $2

Variab le O R pD em ographics African Am erican (race) 3.16 .001Pre-V ietnam H eroin use 0.07 .007 N arcotic in jection 26.2 .0002 Too m uch drinking 2.23 .05In -V ietnam N arcotic w ithdrawal 4.32 .0004Post-V ietnam W anted to take narcotics 4.30 .001 Know where to buy hero in 2.06 .04 D rug use to he lp depression 2.16 .05

Prediction of Premature Death: Estimates of Logistic Regression Odds Ratios.

Source: Price et al, 2000.

Why Are We Still Using Linear Models Why Are We Still Using Linear Models in Epidemiology in 2003?in Epidemiology in 2003?

Historically, inferential statistics Historically, inferential statistics have been one of the most have been one of the most efficient ways to assess generality efficient ways to assess generality of a finding.of a finding.How to validate a finding without How to validate a finding without an inferential statistics is still an inferential statistics is still not an easy task.not an easy task.Linear models are easier to run Linear models are easier to run and interpret.and interpret.

University of Pittsburgh School of Medicine, WPIC, 2003

Ideas behind Ideas behind ““DataData--MiningMining””ApproachesApproaches

Do not rely on statistical inferences.Do not rely on statistical inferences.Use high computational power that Use high computational power that affords extensive iterative processes.affords extensive iterative processes.Gold standards exist within the data.Gold standards exist within the data.Validation is assessed from prediction Validation is assessed from prediction of of ““futurefuture”” data.data.

BUT, ALSO. . .BUT, ALSO. . .A sound evaluation method is even A sound evaluation method is even more critical.more critical.

University of Pittsburgh School of Medicine, WPIC, 2003

Data-Mining Approaches to Suicide and Suicidal Behavior:

Purposes and TechniquesPurpose Data-Mining

TechniquesComparisonBiostatistical

MethodVariable Selection:Select the mostpredictive measures ofrisk and protectivefactors

Genetic Algorithms (GA) Forward Selection (FS)

Identify Subgroups atRisk: Delineate patternsof interaction amongpredictive measures

Tree Based Regression(TBR)

Not Examined*

Overall Prediction:Maximize the predictivepower of the selectedmeasures

Artificial Neural Networks(ANN)

Logistic Regression,Quadratic Discriminant

Analysis (QDA)

Generalization Validation Method

Variable Selection

Forward Selectionby individual

variables

Quadratic Discriminant

Analysis(Percent Incorrect)

Genetic Algorithmby set of variables

ANN-MLP(Cross-entropy

error)Estimates

included 5,10,1520,25,30,35,40

of both Guassianand sigmoid

neurons

Model(Using Previously

Selected Variables)

Analysis Steps Comparing Traditional Statistical vs. Data-Mining Techniques

Evaluation Criterion

Logistic Regression(P2 Statistic)

Cross-Validation Mimicking Prediction of “Future” Data

Divide the data

Original Data

Training Set Testing Set

An evaluation method is a method that An evaluation method is a method that enables the researcher to assess how enables the researcher to assess how well a model predicted the well a model predicted the ““futurefuture””data. data. Simple error rates are insufficient for Simple error rates are insufficient for epidemiologic data.epidemiologic data.When trying to predict phenomena When trying to predict phenomena with low prevalence, error rates would with low prevalence, error rates would produce a model that is produce a model that is ““all negative.all negative.””Error rates give no insight as to why Error rates give no insight as to why mistakes are made.mistakes are made.

Evaluation Methods 101Evaluation Methods 101

Evaluation Methods (1)

ROC combines the values of ROC combines the values of Sensitivity and Specificity Sensitivity and Specificity ---- A A simple idea.simple idea.Originally developed as radar Originally developed as radar screen technique in engineering.screen technique in engineering.Subsequently applied to medical Subsequently applied to medical decision theory. decision theory.

Receiver Operating Receiver Operating Characteristics (ROC) Curve as Characteristics (ROC) Curve as

an Evaluation Methodan Evaluation Method

Evaluation Methods (2)

SensitivitySensitivitySensitivity is the fraction of observed Sensitivity is the fraction of observed positive outcome cases that are positive outcome cases that are correctly classified.correctly classified.

Sensitivity = C / (C+ D)

Evaluation Methods (3)

NumberCorrect

NumberIncorrect

No A B

Yes C D

SpecificitySpecificitySpecificity is the fraction of observed Specificity is the fraction of observed negative outcome cases that are negative outcome cases that are correctly classified.correctly classified.

Specificity = A / (A+ B)

Evaluation Methods (4)

NumberCorrect

NumberIncorrect

No A B

Yes C D

Sensitivity vs. SpecificitySensitivity vs. Specificity

•• Sensitivity can always be 100% if all Sensitivity can always be 100% if all observations are considered to be observations are considered to be positive.positive.•• Specificity can always be 100% if all Specificity can always be 100% if all

observations are considered to be observations are considered to be negative.negative.•• We can vary sensitivity or specificity We can vary sensitivity or specificity

by changing the threshold at which we by changing the threshold at which we make a positive classification.make a positive classification.•• Graphing sensitivity vs. (1 Graphing sensitivity vs. (1 -- specificity) specificity)

yields the ROC curve.yields the ROC curve.

Evaluation Methods (5)

The Area Under Curve (AUC) is a The Area Under Curve (AUC) is a measure of predictive power of a measure of predictive power of a model.model.The AUC varies from .5 to 1.0.The AUC varies from .5 to 1.0.AUC beyond .8 is rather rare, when AUC beyond .8 is rather rare, when using epidemiologic measures.using epidemiologic measures.You can do some neat stuff with ROC You can do some neat stuff with ROC by changing by changing AUCAUC’’ss to other to other measures, if you are an advanced measures, if you are an advanced user.user.

ROC: Area Under Curve ROC: Area Under Curve (AUC)(AUC)

Evaluation Methods (6)

Evaluation Methods (7)

0

1

0 1

1-Specificity

Sens

itivi

ty

AUC=.8

AUC=.5

Example of ROC

Designed to study the Designed to study the comorbiditycomorbidity of substance of substance use and psychiatric disorders (PI. Ronald C. use and psychiatric disorders (PI. Ronald C. Kessler, Ph.D.) Kessler, Ph.D.) A representative A representative livingliving sample of the civilian sample of the civilian nonnon--institutionalized US population, age 15institutionalized US population, age 15--54. 54. A large sample size (N=8,098) with a sufficient A large sample size (N=8,098) with a sufficient number reporting suicidal behaviors (suicide number reporting suicidal behaviors (suicide attempt n=319).attempt n=319).Measures include suicidal behavior, Measures include suicidal behavior, demographic, SES, life experience mental demographic, SES, life experience mental health, substance use, violence, access to care.health, substance use, violence, access to care.Total of 77 variables developed after Total of 77 variables developed after preliminary analyses. preliminary analyses.

National National ComorbidityComorbidity Study Study (NCS, 1990(NCS, 1990--2)2)

Dataset (1)

One of a series of mortality surveys done by the One of a series of mortality surveys done by the National Center for Health Statistics National Center for Health Statistics ---- arguably arguably the largest incidence study of deaths in US.the largest incidence study of deaths in US.1% of deaths in 1993 from a 10% drawing of 1% of deaths in 1993 from a 10% drawing of death certificates (with death certificates (with oversamplingoversampling of deaths of deaths from external causes).from external causes).A large sample size the A large sample size the deceaseddeceased (N=22,957) (N=22,957) with a sufficient number of suicide deaths with a sufficient number of suicide deaths (n=2,043).(n=2,043).Used as Used as ““replicationreplication”” of NCS results of NCS results ---- if suicide if suicide represents above threshold of the underlying represents above threshold of the underlying distribution of suicidal behavior liability, most distribution of suicidal behavior liability, most predictors should be similar.predictors should be similar.Measures obtained from death certificates and Measures obtained from death certificates and followfollow--back interviews with nextback interviews with next--ofof--kinskins..

National Mortality FollowNational Mortality Follow--Back Back Survey (NMFS, 1993)Survey (NMFS, 1993)

Dataset (2)

Measures from NCS and NMFSDemographic/SES Sex Age Country born Race Income/assets Education Life Experience Marital status Living alone Nursing home Home assistance Mobility outside of home Level of daily activities Recreational activities Family connectiveness Support from friends Employment Religion Money problems Trauma Physical characteristics Physical health Mental Health Disease: Alz., Dementia Cognitive impairment Resolve/hope * Depression Paranoia Post traumatic stress disorder

* * *

* *

Mental Health (cont’d) Anti-social personality Alcohol dependence/use Drug dependence/use Tobacco dependence/use Access to care Insurance Desire to seek help Belief in efficacy Received treatment Any medical Any psych/mental Any alc/drug Pysch/mental facility Alc/drug facility Other barriers to care Violence Combat Rape Physical attack Help captive/kidnapped Murdered Circumstances of/at death Suicide Specific Thought * Attempt * Completion Availability of method

Domains/Measures Domains/MeasuresNCS NCSNMFS NMFS

**

********

National Comorbidity Study (NCS)

Improving the Prediction of Suicidal Behavior

•• Males (n=3,835) and females (n=4,263) Males (n=3,835) and females (n=4,263) separately run.separately run.•• Outcome variable = suicidal behavior Outcome variable = suicidal behavior

(thought, plan or attempt) past year.(thought, plan or attempt) past year.

Data-Mining ApproachNumber One

Genetic Algorithms (GA)

•• Originally GA was used to select Originally GA was used to select traits that maximize an outcome. traits that maximize an outcome. This is based on the idea of This is based on the idea of selection of traits of an animal that selection of traits of an animal that maximize survivability.maximize survivability.

•• For us, GA is used as a method of For us, GA is used as a method of variable selection.variable selection.

WhatWhat’’s the Genetic Algorithms (GA)?s the Genetic Algorithms (GA)?

Genetic Algorithms (1)

Why and When Use GA?Why and When Use GA?

•• The independent variables are often The independent variables are often collinear.collinear.•• The model relies on interactions The model relies on interactions

between independent variables.between independent variables.•• There are enough variables that There are enough variables that

trying all subsets would be trying all subsets would be computationally prohibitive.computationally prohibitive.

Genetic Algorithms (2)

Principles of GAGenetic Algorithms (3)

Genetic Algorithm asGenetic Algorithm asA Variable SelectorA Variable Selector

•• The population is a collection of sets of The population is a collection of sets of variables. variables. •• Each set of variables will have a Each set of variables will have a

goodness of fit value associated with it.goodness of fit value associated with it.•• We choose a subset of variables as We choose a subset of variables as

though they are the though they are the ““traitstraits”” we wish to we wish to select.select.•• A subset of variables that maximizes the A subset of variables that maximizes the

goodness of fit function is the goodness of fit function is the ““survival survival of the fittest.of the fittest.””

Genetic Algorithms (4)

GA Variable SelectionGA Variable Selection

•• Sets of variables are selected by Sets of variables are selected by their goodness of fit their goodness of fit probabilistically.probabilistically.•• In this case, duplicate sets of In this case, duplicate sets of

variables are not allowed.variables are not allowed.•• Parent variables are allowed to Parent variables are allowed to

compete with mutants and offspring compete with mutants and offspring variables in the next generation.variables in the next generation.

Genetic Algorithms (5)

Crossover in Variable SelectionCrossover in Variable Selection••Common variables are retained by each Common variables are retained by each offspring set of variables.offspring set of variables.••A variable that appears in one parent A variable that appears in one parent only is randomly assigned to an only is randomly assigned to an offspring.offspring.••The remaining offspring receives a The remaining offspring receives a variable at random from the other variable at random from the other parent.parent.••This is repeated until both This is repeated until both offspringsoffspringshave the correct number of variables.have the correct number of variables.••Crossover is only meaningful between Crossover is only meaningful between parents with at least two variables not parents with at least two variables not in common.in common.

Genetic Algorithms (6)

1. Two sets of variables are selected probabilistically based on their scores.

CrossCross--Over ExampleOver Example

2. Common variables are passed to each offspring.

CrossCross--Over ExampleOver Example

3. Variables unique to each parent variables are distributed among the offsprings.

CrossCross--Over ExampleOver Example

Principles of GAGenetic Algorithms (7)

Mutation in Variable SelectionMutation in Variable Selection

• The user supplies the probability the he user supplies the probability the a mutation occurs.a mutation occurs.•• A mutation randomly substitutes one A mutation randomly substitutes one

variable for another.variable for another.•• Mutation helps to provide a global Mutation helps to provide a global

maximum.maximum.

Genetic Algorithms (8)

What GA Needs to KnowWhat GA Needs to Know•• How many variables are we going to How many variables are we going to

select? select? ---- 1515•• What model are we using? What model are we using? ---- QDAQDA•• What goodness of fit function are we What goodness of fit function are we

maximizing? maximizing? ---- % incorrect, ROC% incorrect, ROC•• How many subsets are in the initial How many subsets are in the initial

population? population? ---- 200200•• What is the mutation rate? What is the mutation rate? ---- .05.05•• When does the algorithm stop?When does the algorithm stop?

---- 100 generations100 generations

Genetic Algorithms (9)

Variable Selection

Forward Selectionby individual

variables

Quadratic Discriminant

Analysis(Percent Incorrect)

Genetic Algorithmby set of variables

Analysis Steps Comparing Traditional Statistical vs. Data-Mining Techniques

Evaluation Criterion

Logistic Regression(P2 Statistic)

Comparison Analysis: Quadratic Comparison Analysis: Quadratic DiscriminantDiscriminant AnalysisAnalysis

• Quadratic DiscriminantDiscriminant Analysis Analysis (QDA) was first introduced by Fisher (QDA) was first introduced by Fisher in 1936.in 1936.

•• QDA classifies outputs based on QDA classifies outputs based on quadratic surfaces of the input space.quadratic surfaces of the input space.

Quadratic Discriminant Analysis (1)

x

y

a

a aa

a

a

QDA QDA -- An ExampleAn Example

b

b

b

b

b

bb

b

bbbb

b

Quadratic Discriminant Analysis (2)

V a r i a b l e s F S G A PM a jo r e x p e n s e s D D < . 0 1I n c o m e p r o b le m s D . 2 5E d u c a t io n le v e l O . 0 7S p o u s e s u p p o r t iv e O O < . 0 1C lo s e n e s s t o r e la t iv e s O . 0 9S u p p o r t f r o m f r ie n d s O . 7 5D a i ly a c t iv i t ie s s t o p p e dD a i ly a c t iv i t ie s s t o p p e d b y d r u g u s e DD a i ly a c t iv i t ie s s t o p p e d o r r e d u c e d O . 3 6D a i ly a c t iv i t ie s s t o p p e d o r r e d u c e d O . 0 9D r u g a b u s e , p a s t y e a r DD r u g d e p e n d e n c e , p a s t y e a r DM a r i ju a n a a b u s e / d e p . l i f e t im e O . 0 7D e p r e s s io n , p a s t y e a r D D < . 0 1D e p r e s s io n , l i f e t im e s y m p t o m O O < . 0 1P T S D , p a s t y e a r D D < . 0 1H a s in s u r a n c e D . 2 0C o m f o r t a b le w i t h t r e a t m e n t D . 6 8B e l ie f in e f f ic a c y o f t r e a t m e n t D . 3 1S e x u a l m o le s t a t io n , l i f e t im e D

Variable Selection by Logistic with Forward Selection and GA-QDA, NCS-Males.

FS: Forward selection with logistic regression.GS: Genetic algorithms with quadratic discriminant analysis.D: Dichotomous variable.O: Ordinal variableP: P-value of the variable using a logistic regression for the variables chosen by GA with QDA

V a r ia b le s F G PIn c o m e O .5 2In c o m e p ro b le m s D .0 8E d u c a t io n le v e l O O .0 1S p o u s e s u p p o r t iv e O O < .0 1C lo s e n e s s to re la t iv e s O .1 7R e la t iv e s S u p p o r t iv e O .1 7D a ily a c t iv i t ie s re d u c e d D .4 1D a ily a c t iv i t ie s s to p p e d o r re d u c e d b y OA d u lt a n t i- s o c ia l b e h a v io r , s y m p to m O O < .0 1A lc o h o l a b u s e , p a s t y e a r DA lc o h o l d e p e n d e n c e , p a s t y e a r D .6 3C h ild h o o d a n t-s o c ia l b e h a v io r , s y m p to m O .2 5D e p re s s io n , p a s t y e a r D D < .0 1D e p re s s io n , l i fe t im e s y m p to m c o u n t O O < .0 1P T S D , p a s t y e a r DN o n -a f fe c t iv e p s y c h o s is , l i fe t im e DB i-p o la r , p a s t y e a r D .0 3D e s ire to s e e k h e lp D .5 7B e lie f in e f f ic a c y o f t re a tm e n t D .2 8R a p e , p a s t y e a r D

Variable Selection by Logistic with Forward Selection and GA-QDA, NCS-Females.

FS: Forward selection with logistic regression.GS: Genetic algorithms with quadratic discriminant analysis.D: Dichotomous variable.O: Ordinal variableP: P-value of the variable using a logistic regression for the variables chosen by GA with QDA

Variable Selection: SummaryVariable Selection: Summary

•• DataData--mining techniques are not a mining techniques are not a magic bullet. Humans must work magic bullet. Humans must work hard to create informative variables.hard to create informative variables.

•• GA is more suitable than linear GA is more suitable than linear forward selection when many forward selection when many additional variables are available.additional variables are available.

•• GAGA--selected variables would often be selected variables would often be thrown out as uninformative, if a thrown out as uninformative, if a linear selection method is used.linear selection method is used.

Data-Mining ApproachNumber Two

Tree-Based Regressions (TBR)

Yes/No

A1/B1

A3/B3A2/B2

At each node one predictor variable is

chosen . . .

To divide the set into two nodes that minimize the

deviance.

Basics of Tree-Based Regression (TBR)

D 2 A logA

A BB log

BA B

Deviancei ii

i ii

i

i i= −

++

+⎛⎝⎜

⎞⎠⎟ =

Tree-Based Regression (1)

131/25

Yes No

126/5

5/20

D = − +⎛⎝⎜

⎞⎠⎟ =2 25

25156

131131156

8552log log .

D = − +⎛⎝⎜

⎞⎠⎟ =2 5

5131

126126131

18 44log log .

D = − +⎛⎝⎜

⎞⎠⎟ =2 5

525

202025

1086log log .

Tree-Based Regression (2)

+

= 29.30

Analysis Scheme for TBR Using RPARTVariable SelectionVariable Selection::The full set, and the fifteenThe full set, and the fifteenvariables selected by variables selected by G.A.G.A.--QDA were used.QDA were used.

Final PruningFinal Pruning::The tree is pruned The tree is pruned back to the level back to the level indicated by indicated by crosscross--validation.validation.

Model BuildingModel Building::Deviance was Deviance was used used to build a to build a standard standard classification tree classification tree Model VerificationModel Verification::

CrossCross--validation is used validation is used to determine at which level to determine at which level of splits the tree is overof splits the tree is over--fitting.fitting.

Tree-Based Regression (3)

RPART: Model Building

•• The user specifies the minimum size of The user specifies the minimum size of a split that RPART will consider a split that RPART will consider making.making.•• The user specifies the minimum size of The user specifies the minimum size of

a node that RPART will consider a node that RPART will consider splitting. splitting. •• The minimum node size is at least The minimum node size is at least

twice the minimum split sizetwice the minimum split size

Tree-Based Regression (4)

RPART: Model VerificationTree-Based Regression (5)

••RPART divides the data into training RPART divides the data into training and testing sets.and testing sets.••Trees of varying complexity are grown Trees of varying complexity are grown with the training set and evaluated with the training set and evaluated with the testing set. with the testing set. ••The error rate of these trees is used to The error rate of these trees is used to select the proper level of complexity select the proper level of complexity for the entire data set.for the entire data set.••The tree is then pruned back to the The tree is then pruned back to the proper complexity.proper complexity.

N=4,251

TBR Predicting Past-Year Suicidal Thought. NCS Males*

Depression Scale >6 (10)

99/1Daily Activities

Impaired by Drugs

90/10Depression Scale

>7 (10)

89/11Money

Problems

60/40

DailyActivitiesStopped

83/17

12/88

* Based on the GA-selected 15 variables.

%Negative/%positiveNo Yes

TBR Predicting Past-Year Suicidal Thought. NCS Females*

N=3,047Past Year

Depression

99/1

31/69

RelativesSupportive <2 (6)

Closeness toRelatives >2 (6)

Adult ASP Scale >1 (4)

No CollegeEducation

Past YearBi-Polar

89/11

Income > 25thPercentile

90/10

38/62

87/1382/18 Closeness toRelatives <4 (6)

37/63RelativesSupportive < 1 (6)

83/17

No IncomeProblems

62/32 38/62* Based on the GA-selected 15 variables.

%Negative/%positiveNo Yes

TreeTree--Based Regression: SummaryBased Regression: Summary•• CrossCross--validated trees were more validated trees were more

informative when GAinformative when GA--selected variables selected variables were input compared to all available were input compared to all available variables in the dataset.variables in the dataset.

•• Shows major gender differences:Shows major gender differences:–– For males, depression was very powerful For males, depression was very powerful and recurring predictor.and recurring predictor.– For females, more intricate interactionsFor females, more intricate interactionswith support environment as well as with support environment as well as other psychopathology were shown.other psychopathology were shown.

•• Clearly showed the patterns of interaction Clearly showed the patterns of interaction among strong predictors.among strong predictors.

TreeTree--Based Regression is Great,Based Regression is Great,But . . .But . . .

•• TBR is recursive TBR is recursive ---- CanCan’’t reshape the t reshape the trunk of a tree (currently anyway). So trunk of a tree (currently anyway). So mismis--specification can be a problem.specification can be a problem.

•• Results may differ with other Results may differ with other measures of measures of ““homogeneity.homogeneity.””(Deviance is one such measure.)(Deviance is one such measure.)

•• Predictive power of TBR is never Predictive power of TBR is never great great ---- As good as logistic at best, As good as logistic at best, because branches are not chosen to because branches are not chosen to maximize the values at the leaves. maximize the values at the leaves.

Jalad-ud-Din Rumi

The beast you ride is your various appetites.When you prune weak branches,the remaining fruitgets tastier.

- New Moon, Hilal

Data-Mining ApproachNumber Three

Artificial Neural Networks (ANN)

What are Artificial Neural What are Artificial Neural Networks (ANN)?Networks (ANN)?

A network of interconnected neurons A network of interconnected neurons as an attempt to mimic the human as an attempt to mimic the human brain.brain.ANN consists of cANN consists of connections that onnections that adapt to adapt to ““learnlearn”” patterns just as patterns just as human synapses adapt to learn. human synapses adapt to learn. An invention from the artificial An invention from the artificial intelligence and pattern recognition intelligence and pattern recognition community.community.For our dataFor our data--mining purposes, an ANN mining purposes, an ANN can be considered as a very flexible can be considered as a very flexible regression method.regression method.

Artificial Neural Networks (1)

Flow of the ANN a Multilayer Perceptron (MLP) Model*

Recalculation of weightsbased on predictedand actual outputs

InputBias

OutputBias(Boutj)(Binpj)

(xpi

)

(wij )f(vpj )

(ypk

)

1 (wjk )2

* Source: Price et al, 2000

vpj= wijl xpi

l-1 + BinpjEi =1

N

The Machinery of ANN-MLP

Epj = (tpj - ypj)212

E= Wp EpjpE E

j

Individual Error Rate

Total Error Rate (W=sample weights)

Two Common Activation FunctionsTwo Common Activation FunctionsArtificial Neural Networks (3)

Sigmoid (Logistic, Squashing)(turns on at about certain value)

Gaussian(produces signal when input is near

certain value)

ANN is Not a Good Tool, Unless ANN is Not a Good Tool, Unless You Pay Attention to:You Pay Attention to:

Data normalization, data scaling.Data normalization, data scaling.Tuning many parameters.Tuning many parameters.Avoid overAvoid over--fitting.fitting.Selection of training set, testing set and Selection of training set, testing set and validation set.validation set.Optimal network architecture.Optimal network architecture.Choice of activation functions.Choice of activation functions.Interpreting results (the most difficult Interpreting results (the most difficult task for ANN users).task for ANN users).

Artificial Neural Networks (4)

Variable Selection

Forward Selectionby individual

variables

Quadratic Discriminant

Analysis(Percent Incorrect)

Genetic Algorithmby set of variables

ANN-MLP(Cross-entropy

error)Estimates

included 5,10,1520,25,30,35,40

of both Guassianand sigmoid

neurons

Model(Using Previously

Selected Variables)

Analysis Steps Comparing Traditional Statistical vs. Data-Mining Techniques

Evaluation Criterion

Logistic Regression(P2 Statistic)

Predictive Power for Past Year Suicidal Predictive Power for Past Year Suicidal Thought Evaluated by ROC: NCSThought Evaluated by ROC: NCS--MalesMales

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

1-Specificity

Sens

itivi

ty

Log AUC=.87

QDA AUC =.89

MLP AUC=.98

Artificial Neural Networks (6)

Predictive Power for Past Year Suicidal Predictive Power for Past Year Suicidal Thought Evaluated by ROC: NCSThought Evaluated by ROC: NCS--FemalesFemales

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

1-Specificity

Sens

itivi

ty

Log AUC=.89

QDA AUC=.89

MLP AUC=.93

Artificial Neural Networks (7)

Improving Prediction: SummaryImproving Prediction: Summary

•• For the NCS male dataset, ROC For the NCS male dataset, ROC showed 22% increase in showed 22% increase in predicitvepredicitvepower compared to logistic.power compared to logistic.

•• The improvement comes from better The improvement comes from better prediction of positivesprediction of positives--more helpful more helpful for precise prediction of pastfor precise prediction of past--year year suicidal thought.suicidal thought.

•• The improvement is The improvement is ““gradualgradual””--both GA both GA and ANN contributed, when humans and ANN contributed, when humans tended them with care.tended them with care.

Artificial Neural Networks (8)

Data-Mining Approach

Artificial Neural Networks (ANN)

Opening the “Black Box”

Freeze ANN after the training Freeze ANN after the training phase(esphase(es). ). Weight ApproachWeight Approach: Examine the : Examine the patterns of weights to and from patterns of weights to and from neurons and input and output variables neurons and input and output variables (analogous to parametric path (analogous to parametric path analysis).analysis).Neuron ApproachNeuron Approach: Examine neuron : Examine neuron vectors themselves vectors themselves -- this makes more this makes more sense, since the sense, since the ““decision processdecision process”” is is stored in neurons.stored in neurons.

How Can We Learn What GA-ANN Did?

ANN- Opening the Black Box (1)

Weights in ANN do not have inherent Weights in ANN do not have inherent meanings associated with values.meanings associated with values.Visual inspection is possible for a simple Visual inspection is possible for a simple model. model. Genetic application (Genetic application (LucekLucek & & OttOtt, 1997): , 1997): sum of product of weights along each path.sum of product of weights along each path.Sum of product of absolute values to take Sum of product of absolute values to take into account rotations of the weight space into account rotations of the weight space ((SacconeSaccone et al., 1999).et al., 1999).Weight space can be condensed for simpler Weight space can be condensed for simpler patterns (e.g, cluster analysis, multipatterns (e.g, cluster analysis, multi--dimensional scaling).dimensional scaling).

Weight ApproachANN- Opening the Black Box (2)

ANN - Opening the Black Box (3)

Males FemalesPA P PA P

Activities stopped or impaired 62.1 19.0 Desire to seek help 55.9 -4.3Comfortable with treatment 62.7 12.1 Bi-polar, past year 57.7 2.7Major Expenses 63.9 3.9 Alcohol dependence, past year 60.5 -21.8Has insurance 68.6 2.4 Depression, past year 62.7 24.8PTSD, past year 69.3 -3.2 Income problems 66.5 4.3Depression, past year 69.5 -14.5 Daily activities reduced 69.0 -8.9Spouse supportive 70.6 -24.5 Belief in efficacy of treatment 76.2 29.8Act. stopped or imp. by drugs 76.6 -15.0 Closeness to relatives 79.1 -4.1Income problems 79.3 0.9 Education level 79.2 0.2Education level 87.8 -18.1 Income 81.7 -7.0Marijuana abuse/dep lifetime 91.8 10.4 Relatives supportive 82.9 -33.1Belief in efficacy of treatment 97.0 1.7 Spouse supportive 88.3 -3.5Support from friends 103.1 18.4 Adult ASP, symptom count 96.2 -47.2Closeness to relatives 110.5 -18.4 Child. ASP, symptom count 97.8 21.1Depression, lifetime symptoms 110.5 -51.7 Depression, lifetime symptoms 116.3 9.9

PA:Product of the absolute value of the weights on the “Yes” neuron.P: Product of the weights on the “Yes” neuron.

Depression is of course important. Depression is of course important. Gender differences inconsistent with TBR Gender differences inconsistent with TBR results: results:

-- Male results show relationships, work, Male results show relationships, work, education are also important but to a education are also important but to a lesser extent;lesser extent;

-- Female results show antisocial Female results show antisocial personality, relationship, education and personality, relationship, education and work are also important, but to a lesser work are also important, but to a lesser extent (similar to TBR results).extent (similar to TBR results).

Possible TBR did not pick up more Possible TBR did not pick up more subtle effects other than depression?subtle effects other than depression?

Weight Approach: What Do the Results Tell You?

ANN- Opening the Black Box (4)

ElmanElman (1990): Trained ANN to predict (1990): Trained ANN to predict orders or 29 words, then examined orders or 29 words, then examined patterns of 150 neurons.patterns of 150 neurons.Cottrell & Cottrell & TsungTsung (1993): Trained ANN (1993): Trained ANN to learn how to to learn how to ““carry onecarry one’’ss”” in a base in a base 4 arithmetic, then examined 16 neuron 4 arithmetic, then examined 16 neuron vectors.vectors.These analyses are very abstract and These analyses are very abstract and difficult to understand and difficult to understand and communicate what results mean.communicate what results mean.

Neuron ApproachANN- Opening the Black Box (5)

National Mortality Follow-Back Survey

(NMFS-93)Improving the

Prediction of Suicide•• Males (n=2,684) and females (n=1,074) Males (n=2,684) and females (n=1,074)

separately run.separately run.•• Outcome variable = completed suicide.Outcome variable = completed suicide.•• Comparison group = accidental death.Comparison group = accidental death.

NMFS NMFS ““ReplicatingReplicating”” NCS ResultsNCS Results

•• Are predictors of suicidal behavior Are predictors of suicidal behavior similar to those of suicide?similar to those of suicide?

•• Can the predicting of suicide be Can the predicting of suicide be improved, like we did with suicidal improved, like we did with suicidal behavior using NCS?behavior using NCS?

•• Caveat:Caveat: NCS and NMFS are two NCS and NMFS are two different datasets: NCS for the living different datasets: NCS for the living only; NMFS for the deceased only.only; NMFS for the deceased only.

NMFS (1)

Variable Selection for Predicting Variable Selection for Predicting Suicide vs. Accident. NMFSSuicide vs. Accident. NMFS--MalesMales•• Depression, and depression related Depression, and depression related

variables such as variables such as ““taking antidepressants taking antidepressants in the last yearin the last year”” were selected. were selected.

•• Some variables such as Some variables such as ““high blood high blood pressurepressure”” and dementia symptom count and dementia symptom count were selected, but related to prediction of were selected, but related to prediction of accidents rather than suicide.accidents rather than suicide.

•• GA selected variables did not vary greatly GA selected variables did not vary greatly from the variables found with forward from the variables found with forward selection.selection.

NMFS (2)

Withdrawn

DepressionSymptom >3

Gun Around

Not Heroin

20/80

Feelings ofWorthlessness

85/15

39/61

94/06 NotMemory

Anti-depressant

61/39 42/58

Feelings ofWorthlessness

Gun AroundBlood

Pressure

69/30 38/62

GunAround 73/27

NotReligious

71/28 34/66

Anti-depressant

86/14

13/8759/41

TBR Predicting Past-Year. Suicide vs. Accident. NMFS Males

N=2,682

%Negative/%positiveNo Yes

TreeTree--Based Regression:Based Regression:Suicide vs. Accident. NMFSSuicide vs. Accident. NMFS--MalesMales

NMFS (3)

•• The male tree is well balanced and The male tree is well balanced and show a number of interesting show a number of interesting interaction patterns (e.g., gun & interaction patterns (e.g., gun & worthlessness).worthlessness).

•• Measures not available in the NCS Measures not available in the NCS dataset shown to be predictive (e.g., dataset shown to be predictive (e.g., gun, antidepressant).gun, antidepressant).

•• Some measures are not predictors of Some measures are not predictors of suicide but of accidents? (e.g., blood suicide but of accidents? (e.g., blood pressure)?pressure)?

Evaluation of Prediction Suicide Evaluation of Prediction Suicide vs. Accident. NMFSvs. Accident. NMFS--MalesMales

•• ANNANN--MLP improved prediction over MLP improved prediction over QDA by a large margin (12%).QDA by a large margin (12%).•• ANNANN--MLP improved prediction over MLP improved prediction over logistic with forward selection by alogistic with forward selection by asmaller amount (5%).smaller amount (5%).•• Most of the improvement from Most of the improvement from

logistic regression to ANN was due to logistic regression to ANN was due to ANNANN--MLP itself and not the method ofMLP itself and not the method ofvariable selection.variable selection.

NMFS (4)

NMFS (5)

Not as Spectacular Improvement Not as Spectacular Improvement for NMFS Compared to NCSfor NMFS Compared to NCS

•• Not as many ordinal variables could Not as many ordinal variables could be created.be created.

•• Proxy measures by next of kin are not Proxy measures by next of kin are not as good as as good as probandsprobands’’ selfself--reports. reports. --But we canBut we can’’t talk to the dead!t talk to the dead!

•• The accident comparison group The accident comparison group maybe problematic maybe problematic -- But we canBut we can’’t t talk to the living!talk to the living!

What Have We Been Asking What Have We Been Asking Ourselves about Epidemiologic Ourselves about Epidemiologic

Results on Suicide?Results on Suicide?

If we do have good data:If we do have good data:Do we really have good tools to Do we really have good tools to analyze the information in the analyze the information in the data? (Analysis)data? (Analysis)

University of Pittsburgh School of Medicine, WPIC, 2003

Do we really have good tools to get Do we really have good tools to get the data? (Assessment)the data? (Assessment)

------ YES, SOMETIMESYES, SOMETIMES

------ YES, IF WE UNDERSTAND YES, IF WE UNDERSTAND THE DATATHE DATA

University of Pittsburgh School of Medicine, WPIC, 2003

DataData--Mining ApproachesMining ApproachesConclusions:Conclusions:

•• DataData--mining techniques are not a magic mining techniques are not a magic bullet.bullet.

•• Only detailed analyses of data can make Only detailed analyses of data can make these techniques work well.these techniques work well.

•• In the future clinicians may be able use In the future clinicians may be able use automated dataautomated data--mining techniques for mining techniques for clinical prediction.clinical prediction.

This presentation is This presentation is brought to you by:brought to you by:

Nathan K. RiskKrista L. RussellRumi Kato Price

Never live without love,or you will be dead.Die with love and you will remain alive

-- Jalal-ud-Din Rumi

Make money, Not war -- Rumi

Supplemental Slides

RPART Cross-ValidationLet TLet T be any tree.be any tree.Define R (T) as the sum of errorDefine R (T) as the sum of error--rate rate over the leaves.over the leaves.If TIf T is a tree |T| is the number of is a tree |T| is the number of leaves.leaves.If If "" is a scalar. is a scalar. Define RDefine R"" (T) as (T) as

RR"" (T) = R (T) + (T) = R (T) + "" |T|. |T|. RR"" (T) is the cost of the tree.(T) is the cost of the tree.

TBR- RPART (2)

RPART Cross-Validation (Cont’d)Define TDefine T"" as the as the subtreesubtree of T that of T that minimizes Rminimizes R"" (T).(T).TTo o = The = The ““fullfull”” treetreeT = No splits at allT = No splits at allIf If "" > > $$ then either T then either T "" = T= T$$ or Tor T"" is a is a subtreesubtree of Tof T$$..

The scalar The scalar "" produces a set of nested produces a set of nested subtreessubtrees that range between Tthat range between Too (the (the full tree) and T (a tree with only one full tree) and T (a tree with only one node).node).

88

88

TBR- RPART (3)

RPART Cross-Validation (Cont’d)

RPART uses crossRPART uses cross--validation to choose validation to choose a best value for a best value for "". That is, RPART uses . That is, RPART uses crosscross--validation to choose the best Tvalidation to choose the best T""..

All possible values of All possible values of "" can be grouped can be grouped into into mm intervals that correspond to intervals that correspond to nested nested subtreessubtrees..

II11 = [0, = [0, "" 11]]II22 = (= ("" 11,, "" 22]]

IImm = (= ("" mm--11,, ]]88

TBR- RPART (4)

Select a Select a $$ii from each interval by from each interval by computing the geometric mean of the computing the geometric mean of the interval.interval.

$$1 1 = 0= 0$$2 2 = = "" 1 1 "" 22

$$mm--11 = = "" mm--2 2 "" mm--11

$$m m = = Each Each $$ ii is a is a ““typicaltypical”” value in Ivalue in Iii

RPART Cross-Validation (Cont’d)

88

TBR- RPART (5)

RPART Cross-Validation (Cont’d)Randomly divide the data into s groups. Randomly divide the data into s groups. GG11, G, G22 ……..G..Gs s of size s/n.of size s/n.Fit a full model on the data set Fit a full model on the data set ““everyone everyone except except GGii”” and determine Tand determine T$$11 , T, T$$22 …… TT$$mm

Compute R(TCompute R(T$$jj) for all 1 ) for all 1 ## jj ## m, m, using using onlyonly GGii..

Sum R(TSum R(T$$jj) over all) over all GGii to estimate R(Tto estimate R(T$$jj) for ) for the entire data set.the entire data set.

Choose the Choose the $$jj with lowest errorwith lowest error--rate. Trate. T$$jj is is the tree selected by RPART.the tree selected by RPART.

TBR- RPART (6)

ElmanElman (1990): Trained ANN to predict (1990): Trained ANN to predict orders or 29 words using a orders or 29 words using a ““contextcontext””layer, then examined patterns of 150 layer, then examined patterns of 150 neurons (150 x (29+2))neurons (150 x (29+2))---- Cluster analysis showed ANN was Cluster analysis showed ANN was

able to recognize word patterns. able to recognize word patterns. Cottrell & Cottrell & TsungTsung (1993): Trained ANN (1993): Trained ANN to learn how to to learn how to ““carry onecarry one’’ss”” in a base in a base 4 arithmetic, then examined results of 4 arithmetic, then examined results of the principal component analyses of 16 the principal component analyses of 16 neuron vectors for a 30neuron vectors for a 30--step additionstep addition

Neuron ApproachANN- Opening the Black Box (5)

ANN - Opening the Black Box (6)

Elman’s ANN ArchitectureOUTPUT UNITS

HIDDEN UNITS

CONTEXT UNITSINPUT UNITS

ANN - Opening the Black Box (7)Cluster Analysis Results of Hidden Context

Layers*

2.0 1.5 1.0 0.0 -0.5

break sleep

see

smellmove think

existsmash

eatlike

chase

glass

cookie

dog

mousecat

lionmonster

dragon

ZOG “man” Dummy

word

boy

womangirl

carbookrock sandwich

breadplate

Source: Elman, 1990

ElmanElman and othersand others’’ pieces are farpieces are far--out! out! Really clever and more direct than the Really clever and more direct than the results of the weights approach for results of the weights approach for complex data.complex data.

BUT. . .BUT. . .You really have to know what you are You really have to know what you are doing.doing.ItIt’’s a difficult task to communicate s a difficult task to communicate this abstract level of analysis to lay this abstract level of analysis to lay audience.audience.

Neuron Approach:How to Make Sense Out of Them?

ANN- Opening the Black Box (8)

For More Information (1)

Where can I learn More about Where can I learn More about Genetic Algorithms (GA)?Genetic Algorithms (GA)?

Books:Books:

DagliDagli (1994) Intelligent Engineering (1994) Intelligent Engineering systems through Artificial Neural systems through Artificial Neural Networks Networks -- A chapter by Tom Downey A chapter by Tom Downey covers GA.covers GA.

KozaKoza (1996) Genetic Programming.(1996) Genetic Programming.

For More Information (2)

Where can I learn More about the Where can I learn More about the Tree Based Regression (TBR)?Tree Based Regression (TBR)?

Books:Books:

BriemanBrieman (1984) Classification and (1984) Classification and Regression Trees.Regression Trees.

VenablesVenables (1999) Modern Applied (1999) Modern Applied Statistics with SStatistics with S--Plus Plus -- Covers the Covers the creation and crosscreation and cross--validation of trees validation of trees with the Swith the S--Plus program Plus program ““rr--part.part.””

Where can I learn More about the Where can I learn More about the Artificial Neural Networks (Artificial Neural Networks (ANNsANNs)?)?IEEE Neural Networks Council Home PageIEEE Neural Networks Council Home Page--http://http://www.ewh.ieee.org/tc/nncwww.ewh.ieee.org/tc/nnc//Books:Books:–– Bishop (1995) Neural Networks for Pattern Bishop (1995) Neural Networks for Pattern

RecognitionRecognition--Covers MLP, RBF, Bayesian, not Covers MLP, RBF, Bayesian, not SOMSOM

–– Ripley (1996) Pattern Recognition and Neural Ripley (1996) Pattern Recognition and Neural NetworksNetworks--Covers MLP, RBF, Some Bayesian Covers MLP, RBF, Some Bayesian perspectives, SOMperspectives, SOM

–– KohonenKohonen (2001)(2001) SelfSelf--Organizing MapsOrganizing Maps–– RumelhartRumelhart and McClelland (1986), Parallel and McClelland (1986), Parallel

Distributed Processing: Explorations in the Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1.Microstructure of Cognition. Vol. 1.

For More Information (3)