Paper 085-2007

1

Paper 085-2007

Text Mining and PROC KDE to Rank Nominal Data Patricia B. Cerrito, University of Louisville, Louisville, KY

ABSTRACT By definition, nominal data cannot be ranked. However, there are circumstances where it is essential to rank nominal data. Examples of such ranking include ranking hospitals and colleges, defining the “most livable cities”, and conference paper submissions. In this project, we consider ranking patient severity. The purpose is to determine how patient severity can be used to rank the quality of hospital performance. There are thousands of patient diagnoses and co-morbidities that make such a ranking very difficult. Generally, nominal variables have been ranked by using quantitative outcome variables. Currently, hospital quality measures used stepwise logistic regression to reduce the number of patient diagnoses considered to define a measure of patient severity. More recently, a weight-of-evidence method has been developed for predictive modeling such that nominal data are compressed and ranked using a target variable. However, there are now methods available that allow for ranking nominal data that do not require outcome variables; instead, outcome variables can be used to validate the ranking. Ranking can be done using SAS Text Miner to compress nominal data fields containing information on patient diagnoses, combined with PROC KDE to define and validate the patient severity ranking. It will be demonstrated that SAS Text Miner can define an implied ranking of nominal fields that is identified through the application of PROC KDE. Once the patient severity rank has been defined, it will be used to examine patient outcomes, and physician variability in patient outcomes.

INTRODUCTION More and more, healthcare providers will be judged and reimbursed based upon their performance on quality measures. Since patient outcomes depend very much on patient conditions, any performance measure will need to use a patient severity index. Because of the subjective nature of defining a patient’s condition, attempts have been made to define an objective index so that healthcare providers can be compared. Even though some measures are generally accepted, they are still problematic. While the example discussed in this paper is focused on the development of a patient severity index, the methodology can be used to compress and to rank levels in any categorical variable. These applications include inventory codes and customer purchases. For our example, we use data available from AHRQ (Agency for Healthcare Research and Quality), the National Inpatient Survey. This survey contains all inpatient events for 37 different participating states, approximately 8 million events per year. We use data from the year 2003. Up to 15 columns in the dataset are used to define the patient condition and another 15 columns are used to define patient treatments. These columns are defined using a coding system developed by the World Health Organization. These codes are available online at http://icd9cm.chrisendres.com. In addition to the patient condition, several patient severity indices are included in the dataset; all such indices using some form of logistic regression to define. In this paper, we propose a new methodology for defining patient severity indices; one that does not depend upon patient outcomes for definition so that they can depend upon outcomes for validation. In sections 2 and 3, we discuss the more standard methodology, presenting the new methodology in section 4. We will compare results using the different methods of data compression.

STANDARD LOGISTIC REGRESSION METHODOLOGY TO DEFINE A PATIENT SEVERITY INDEX The standard statistical method used to define a patient severity index for ranking the quality of care of healthcare providers is that of regression. Logistic regression is used when the outcome variable considered is mortality (or a severe adverse event); linear regression is used when the outcome variable is cost or length of stay. Given the number of ICD9 codes that are available, the set of codes must be reduced to be used in a regression equation. At some point, a stepwise procedure is used to find the codes that are the most related to the outcome variable. The Agency for Healthcare Research and Quality (AHRQ) has developed a draft report on a number of quality indicators. 1 All of the indicators are risk-adjusted using a measure of patient severity. One such measure is the all patient refined diagnosis related group (ADRDRG). This measure can be used to adjust patient severity, or to adjust the risk of patient mortality. The formula used to define quality is equal to:

( )ijk

Q

qq

ij

P

pkpkijk

kk

APRDRGgenderageYit ∑∑

==+⎟

⎠⎞⎜

⎝⎛+==

110))1(Pr(log αβ

where Yijk is the response for the jth patient in the ith hospital for the kth quality indicator. The value (age/genderp)ij is equal to the pth age by gender zero-one indicator associated with the jth patient in the ith hospital and (APRDRGq)ijk is equal to the qth APRDRG zero-one indicator variable associated with the jth patient in the ith hospital for the kth quality

SAS Global Forum 2007 Data Mining and Predictive Modeling

2

indicator. Then the risk adjusted rate is equal to

rate average Nationalrate Expectedrate Observedrate adjustedRisk ∗=

. AHRQ suggests using a generalized model to account for correlations across patient responses in the same hospital. The risk-adjusted rate is also based upon a logistic regression.2 However, many organizations are using standardized risk adjustment measures even though they have questionable validity when extrapolated for fresh data. For example, the Tennessee Hospital Discharge Data System uses the expected mortality defined by the equation:3

Expected mortality=1/(1+e-z) where z=-9.566+1.542(risk of death mortality weight)+3.819(severity of illness mortality weight)-18.07(gender mortality weight)+0.04(age)-0.045(length of stay in days)+6.937(APRDRG mortality weight)+0.332(severity of illness class)+0.994(risk of death class) and

• Risk of death mortality weight=average mortality rate for the risk of death classification the patient was assigned to

• Severity of illness mortality weight=average mortality rate for the severity of illness classification the patient was assigned to

• Gender mortality weight=average mortality rate for the patient’s gender • APRDRG mortality weight=average mortality rate for the APRDRG the patient was grouped to • Severity of illness class=severity of illness class assigned to the patient by the APRDRG system • Risk of death class=risk of death class assigned to the patient by the APRDRG system

For example, suppose there are fives codes (A,B,C,D,E) used in a logistic regression with the outcome variable of mortality. Then the regression equation can be written

P=α0+ α1(if A is present)+ α2(if B is present)+ α3(if C is present)+ α4(if D is present)+ α5(if E is present) P is the predicted probability of mortality and α0+ α1+ α2+ α3+ α4+ α5=1. The predicted probability increases as the number of codes increases. One assumption of regression is that the 5 codes are independent (and uncorrelated). For example, if A=diabetes and B=congestive heart failure (CHF) then the likelihood of someone with diabetes having CHF is no greater than the likelihood of someone without diabetes having CHF. However, since diabetes can lead to heart disease, this assumption of independence is clearly false. Another problem with logistic regression is that there are very questionable results with very disparate group sizes. There is a value p chosen such that if P<p then the model predicts no mortality but if the value of P>p then the model predicts mortality. Suppose we are looking at a condition where the mortality is 1%. Then if p=1, the accuracy level of the model is 99%, although the model has no value predictively. However, the false negative rate (predicting no mortality when mortality occurs) will also be 100% while the false positive rate (predicting mortality when no mortality occurs) will be 0%. As p decreases, the accuracy of the model will decrease slightly as will the false negative rate; however, the false positive rate will also increase; however, the false negative rate will remain over 90%. Unfortunately, disparate group sizes are almost never accounted for in the development of the logistic regression model. Consider, for example, that αi=0.20 for i=1,2,3,4,5 and α0=0. If p=1 then all 5 codes must be present in order to predict mortality. If p=.8, then an individual must have 4 out of the 5 codes to predict mortality, and so on. The possible threshold values are 0, .2, .4, .6,.8, and 1. The equal values indicate that there is an equally likely chance of 0.2 that any one code is related to one patient. Therefore, the probability of having all five codes is (.2)5=0.00032. The probability of having 4 out of 5 is equal to 5(.2)4(.8)=0.0064. The remaining probabilities are 0.0512 (for 3), 0.2048 (for 2), 0.4096 (for 1) and 0.32768 (for 0). That gives the possible threshold values for predicting mortality. Regardless of what the codes then represent, those are the only possible threshold values. If the threshold value requires all 5 codes, then 0.99968 of all patients will not have a predicted value of mortality. If only 4 of the 5 codes are required, then 0.99328 will not have a predicted value of mortality. Once the number of codes required is equal to 3 or more, the predicted mortality rate climbs to 0.05792. The differential between the predicted mortality rate and the actual mortality rate is used to rank healthcare providers.4 If the predicted mortality is much higher than the actual mortality rate, then the provider will have a higher


3

ranking. One way to improve the ranking is to make the predicted mortality as high as possible; that is, by reporting all patients (or many patients) above the threshold value. If the 5 codes are known to the provider, extra diligence can be placed on the documentation of the 5 regardless of how well other codes are documented; it becomes more difficult if the 5 codes are not known to the provider. Another problem with the use of regression is the requirement that the codes are uniformly entered by all providers. Entry of codes depends upon the accuracy of documentation. Consider, for example, all of the 4-digit codes associated with diabetes:

• 250.0 Diabetes mellitus without mention of complication • 250.1 Diabetes with ketoacidosis • 250.2 Diabetes with hyperosmolarity • 250.3 Diabetes with other coma • 250.4 Diabetes with renal manifestations • 250.5 Diabetes with ophthalmic manifestations • 250.6 Diabetes with neurological manifestations • 250.7 Diabetes with peripheral circulatory disorders • 250.8 Diabetes with other specified manifestations • 250.9 Diabetes with unspecified complication

The fifth digit for diabetes represents the following:

• 0 type II or unspecified type, not stated as uncontrolled • 1 type I [juvenile type], not stated as uncontrolled • 2 type II or unspecified type, uncontrolled

For a complete listing of the ICD9 codes, the interested reader is referred to http://www.disabilitydurations.com/icd9top.htm. The term, “uncontrolled” is left undefined. It is questionable whether every physician who admits patients to a hospital will document “uncontrolled” on the patient’s chart. If providers document differently, those who document the codes for the regression equation will rank higher compared to those who do not. Suppose one provider documents the 5 codes at 25% rather than 20% while another documents at 15%. Then the probability of having 3 or more codes is equal to 0.1035 for the first provider but 0.0266 for the second. If they have the same actual mortality rate, provider 1 will rank considerably higher compared to provider 2. In practice, the number of ICD9 codes used is more than 5 and the probabilities of each code occurring will not be equally likely. However, the result is basically the same. If the provider knows what codes are used in the regression equation, the number of codes that must be carefully documented are the ones in the regression; the rest have no importance. For example, the Healthcare Financing Administration (HCFA) first uses cluster analysis on the codes to group the ICD9 codes with each code variable defined as 1 if present and 0 if absent followed by stepwise logistic regression to identify the most important cluster.5 The result will be extremely biased if some providers have access to the regression equation while others do not, those who do not are penalized. Also, different regression equations can result in different rankings. While many patients will have similar severity ranks across different formulas, in as much as 20% of the patient base, different measures can vary considerably.6 Iezzoni, Ash, et al states, “Detailed evaluation of severity measures appears to be a narrow methodologic pursuit, far removed from daily medical practice. Nevertheless, severity-adjusted death rates are widely used as putative quality indicators in health care provider ‘report cards’. Monte Carlo simulation also indicates problems in the model; these logistic models tend to have low positive predictive value, indicating issues with disparate group sizes for rare occurrances.7 Other studies show even worse agreement, indicating that the measures can only identify outlier providers that perform very poorly and that the measures are not valid for ranking all providers.8-10

To define a patient severity model using the National Inpatient Sample (NIS) database, we use the patient condition codes, labeled DX1-DX15 in the dataset. We create a series of indicator functions for each ICD9 code that the user wants to investigate for a severity index. While time consuming, the user can use all of the codes and then reduce them by other means. The datastep code for creating these indicator functions is as follows:

Data sasuser.indicatorcodes; Set sasuser.NISdata2003; Do i=1 to 15; If dx[i]=’25000’ then n25000=1; Else n25000=0; End; Run;


4

The above code defines just one indicator function; others can be added. Once defined, the indicator functions are used in a logistic regression with the Regression Node in Enterprise Miner. The NIS database provides a number of severity measures that are defined using regression. Therefore, we can compare the results from defining a severity measure using text analysis to the results using the more traditional regression. We use the data from the year 2003. The measures that we compare are

• APRDRG mortality risk, using the all patient refined DRG as developed by AHRQ • APRDRG severity, using the all patient refined DRG as developed by AHRQ • Disease staging: mortality level as developed by Medstat • Disease staging: resource demand level as developed by Medstat

All four measures were developed using a logistic regression or linear regression process (depending upon whether the outcome variable was discrete as in mortality level or continuous as in resource demand level). We will also compare these different measures to each other.

WEIGHT OF EVIDENCE METHOD There are some simple methods to reduce a complex categorical variable. Probably the easiest is to define a level called, ‘other’. All but the most populated levels can be rolled into the ‘other’ level. This method has the advantage of allowing the investigator to define the number of levels, and to immediately reduce the number to a manageable set. For example, in a study of medications for diabetes, there were a total of 358 different medications prescribed. It is possible to use the ten most popular medications and then combine the remaining medications into ‘other’. However, the ‘other’ category should consist of fairly homogeneous values. This can cause a problem when examining patient condition codes. Some patient conditions that are rarely used can require extraordinary costs and should not be rolled into a general ‘other’ category. An example of this is a patient who requires a heart transplant and who has a ventricular assist device inserted. The condition is rare, but the cost is extraordinary. Another method is called target-based enumeration. In this method, the levels are quantified by using the average of the outcome variable within each level. The level with the smallest outcome is recoded with a 1, the next smallest with a 2, and so on. A modification of this technique is to use the actual outcome average for each level. Levels with identical expected outcomes are merged. This modification is called weight-of-evidence recoding11. The weight-of-evidence (WOE) technique works well when the number of observations per level is sufficiently large to get a stable outcome variable average.11 It does not generalize well to fresh data if the number of levels is large and the number of observations per level is small. In addition, there must be one, clearly defined target variable since the levels are defined in terms of that target. In a situation where there are multiple targets, the recoding of categories is not stable. In addition, the target variable is assumed to be interval so that the average value has meaning. If there are only two outcome levels (for example, mortality), the weight-of-evidence would be reduced to defining a level by the number of observations in that level with an outcome value of 1 (death). We use a macro to define the weights for this process:12 The WOE macro yields a total of 10 different levels. These levels are compared to results defined using logistic regression as provided in the database (Tables 1,2).

libname woe '.' options mstored sasmstore=woe; %macro smooth_weight_of_evidence(data=, out=, input=, target=, n_prior=, n0n1=, fname=)/store source; %if &fname= %then %let fname=$w%substr(&input,1,6); %else %let fname=$w%substr(&fname,1,6); proc sql; create table f as select "&fname" as FMTNAME, &input as START, sum(&target=1) as N1, sum(&target=0) as N0 from &data


5

group by &input; quit; data f; set f end=last; LABEL=log((N1+ &n_prior/(1+&n0n1))/(N0+&n_prior*&n0n1/(1+&n0n1))); output; if last then do; START='OTHER'; LABEL=-log(&n0n1); output; end; run; proc format cntlin=f; run; data &out; set &data; w_&input=put(&input,&fname..)+0; run; %mend; libname ed 'c:\ed'; %smooth_weight_of_evidence (data=sasuser.niswoe,out=nis. niswoe, input=dx1,target=totchg,n_prior=50,n0n1=2); run;

Table 1. WOE Levels Compared to APRDG Mortality Risk From Logistic Regression APRDG Mortality Risk

Woe Levels

0 1 2 3 4 5 6 7 8 9

Total

0 9 5.42 0.06

1 0.60 0.00

56 33.73 0.08

3923.490.21

137.830.01

42.410.00

74.220.01

29 17.47 0.03

1 0.60 0.00

74.220.03

166

1 6457 1.55

40.63

8548 2.06

36.90

48716 11.72 68.24

137743.31

73.23

7870118.9444.99

5224112.5753.09

11193226.9395.40

63531 15.29 75.42

17255 4.15

63.93

144103.47

55.54

415565

2 4189 2.56

26.36

9147 5.60

39.48

16765 10.26 23.49

36302.22

19.30

6600140.3837.73

2861517.5129.08

44662.733.81

16555 10.13 19.65

6839 4.18

25.34

72424.43

27.91

163449

3 3254 5.52

20.48

4492 7.61

19.39

4864 8.24 6.81

9371.594.98

2287338.7713.08

1336122.6513.58

7641.290.65

3399 5.76 4.04

1998 3.39 7.40

30555.18

11.77

58997

4 1982 10.48 12.47

978 5.17 4.22

983 5.20 1.38

4282.262.28

734838.864.20

418122.114.25

1590.840.14

718 3.80 0.85

898 4.75 3.33

12336.524.75

18908

Total 15891 23166 71384 18808 174936 98402 117328 84232 26991 25947 657085


6

Table 2. WOE Levels Compared to APRDRG Severity APRDRG Severity

WOE Levels

0 1 2 3 4 5 6 7 8 9

Total

0 9 5.42 0.06

1 0.60 0.00

56 33.73 0.08

3923.490.21

137.830.01

42.410.00

74.220.01

29 17.47 0.03

1 0.60 0.00

74.220.03

166

1 3685 1.43

23.19

6215 2.42

26.83

27032 10.51 37.87

79453.09

42.24

4985719.3828.50

2974511.5730.23

7556429.3864.40

40402 15.71 47.97

9360 3.64

34.68

73902.87

28.48

257195

2 5143 1.97

32.36

9305 3.56

40.17

32028 12.24 44.87

78773.01

41.88

7925130.2845.30

3844614.6939.07

3471013.2629.58

32676 12.49 38.79

11481 4.39

42.54

108014.13

41.63

261718

3 4721 4.21

29.71

6354 5.67

27.43

10410 9.29

14.58

25042.23

13.31

3736033.3521.36

2335620.8523.74

66625.955.68

9830 8.77

11.67

4877 4.35

18.07

59665.32

22.99

112040

4 2333 8.98

14.68

1291 4.97 5.57

1858 7.16 2.60

4431.712.36

845532.564.83

685126.386.96

3851.480.33

1295 4.99 1.54

1272 4.90 4.71

17836.876.87

25966

Total 15891 23166 71384 18808 174936 98402 117328 84232 26991 25947 657085

The different patient severity indices have no obvious relationship. The WOE levels are scattered across both the mortality and the severity indices. Since the different methods yield such different results, we need to find alternative methods to determine a “good” severity index. Tables 3 and 4 compare WOE results to the two other disease staging measures. Table 3. WOE Levels Compared to Disease Staging: Mortality

Disease Staging: Mortality

WOE Levels

0 1 2 3 4 5 6 7 8 9

Total

0 672 0.55 4.26

2218 1.81 9.58

6028 4.92 8.46

29652.42

15.78

11090.900.63

117919.62

12.01

8413568.6671.71

10518 8.58

12.50

2262 1.85 8.43

8440.693.26

122542

1 475 1.20 3.01

245 0.62 1.06

2333 5.91 3.27

12113.076.45

37079.402.12

515713.075.25

1823346.2215.54

3895 9.87 4.63

3151 7.99

11.75

10422.644.03

39449

2 3091 1.86

19.58

3109 1.87

13.43

29531 17.74 41.45

74524.48

39.66

4298925.8324.60

2010912.0820.48

82474.967.03

41240 24.78 49.00

5579 3.35

20.80

50723.05

19.60

166419

3 4359 1.90

27.61

6752 2.94

29.17

26447 11.52 37.12

56492.46

30.07

8795838.3350.32

3896616.9839.69

63262.765.39

25707 11.20 30.55

11937 5.20

44.49

154046.71

59.54

229505

4 4628 5.87

29.31

8456 10.72 36.53

6078 7.71 8.53

12511.596.66

3288041.7018.81

1642420.8316.73

3520.450.30

2505 3.18 2.98

3161 4.01

11.78

31213.96

12.06

78856

5 2563 13.26 16.23

2365 12.24 10.22

829 4.29 1.16

2601.351.38

614331.783.51

572329.615.83

260.130.02

292 1.51 0.35

738 3.82 2.75

3902.021.51

19329

Total 15788 23145 71246 18788 174786

98170 117319

84157 26828 25873 656100

There seems to be no relationship between the two measures as shown in Table 3; the same non-relationship is shown in Table 4. In Table 4, there are only 5 values in code 1 because it represents the unknown codes, and is not part of disease staging.


7

Table 4. WOE Levels Compared to Disease Staging: Resource Demand Disease Staging: Resource Demand

WOE Levels

0 1 2 3 4 5 6 7 8 9

Total

1 0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

00.000.00

00.000.00

120.000.00

00.000.00

2 40.00 0.00

0 0.00 0.00

240.000.01

5

2 2143 1.99

13.53

210 0.20 0.91

7611 7.07

10.67

57245.32

30.45

1142810.616.53

54885.105.58

6220857.7853.02

9330 8.67

11.08

1290 1.20 4.78

22372.088.62

107669

3 6002 1.72

37.88

8460 2.42

36.52

48785 13.98 68.39

94922.72

50.50

9309026.6753.22

5808516.6459.03

5275215.1144.96

48034 13.76 57.04

12773 3.66

47.33

115713.32

44.61

349044

4 6209 3.88

39.19

11179 6.99

48.26

12912 8.07

18.10

27551.72

14.66

5329933.3130.47

2762917.2628.08

21771.361.86

23698 14.81 28.14

10670 6.67

39.53

95045.94

36.64

160032

5 1489 3.71 9.40

3317 8.26

14.32

2027 5.05 2.84

8242.054.38

1710142.579.78

719117.907.31

1860.460.16

3152 7.85 3.74

2256 5.62 8.36

26266.54

10.12

40169

In addition, we use kernel density estimation to compare the WOE levels in terms of outcomes. Figure 1 gives the relationship of WOE level to total charges, which were used to define the WOE levels. Figure 2 gives the relationship to Length of Stay. A natural ordering is established with total charges, with the exception of levels 1 and 3 which have several cross-over points with the other levels. Level 6 has the highest probability of a lower cost compared to all other levels. The highest probability of high cost is shared by levels 1, 2, and 3. Figure 1. WOE Level by Total Charges

When we change the target value to length of stay, the ordering changes as well (Figure 2). Level 6 has the highest probability of a short stay compared to all other levels with level 7, the next lowest. However, crossover does occur between levels 6 and 7 early on for the shortest stay. Level 1 now has the highest probability of a lengthy stay. WOE finds costs up to about $40,000 and lengths of stay up to day 12.


8

Figure 2. WOE by Length of Stay

TEXT MINING METHODOLOGY In order to perform text analysis on the NIS data file, the data must first be pre-processed. We need to merge the 15 columns containing patient conditions into one text string. We do this using a data step with the following statement: String=catx(‘ ‘,DX1,DX2,DX3,DX4,DX5,DX6,DX7,DX8,DX9,DX10,DX11,DX12,DX13,DX14,DX15); separating each code with a space. Because of machine limitations, we used a 10% sample. Once the text string is defined, we use the Text Miner code node in Enterprise Miner (Figure 3). Figure 3. Use of the Text Miner Node to Create Clusters of Patient Severity

Figure 4. Options in Text Miner

Figure 4 gives the available options in the Text Miner node. Since the ICD9 codes are numeric, the default for Numbers must be changed from ‘no’ to ‘yes’.

Change default to ‘yes’ for Numbers


9

After running Text Miner, the interactive results are shown in Figure 5. Note the interactive button in Figure 4. Clicking on the button gives Figure 5. Figure 5. Interactive Results for the Text Miner Node

To define text clusters, go to the Tools menu (Figure 6). It gives an option to cluster the text field. Figure 6. Tools Menu to Cluster Documents

Figure 7 provides options for clustering. Because we are defining a patient severity index, we specified that exactly 5 clusters should be defined. The number can be changed by the user. The number of terms (ICD9 codes) used to describe the cluster can also be specified by the user. In this example, we limit the number to ten. We use the standard defaults of Expectation Maximization and Singular Value Decomposition. It is recommended that these not be changed. The results as shown in the interactive window are given in Figure 8.

Lists all terms in 2 or more observations


10

Figure 7. Clustering Options in Text Miner

Figure 8. Interactive Window After Clustering

Figure 9. Cluster Window

Note that a third window in Figure 8 is shown giving the clusters. This window is enlarged in Figure 9.


11

Figure 10. Finding the Datasets

Figure 11. Datasets Defined in Text Miner

Once we have the datasets, we using the following code in the SAS Code Node. Its purpose is to put both the cluster descriptions and the cluster numbers in the original dataset.

data sasuser.clusternis (keep=_cluster_ _freq_ _rmsstd_ clus_desc); set emws.text7_cluster; run; data sasuser.desccopynis (drop=_svd_1-_svd_500 prob1-prob500); set emws.text7_documents; run; proc sort data=sasuser.clusternis; by _cluster_; proc sort data=sasuser.desccopynis; by _cluster_; data sasuser.nistextranks; merge sasuser.clusternis sasuser.desccopynis; by _CLUSTER_; run;

To make a comparison of outpatient cost by cluster, we use kernel density estimation with the following code:

proc kde data=sasuser.totalcostandclusters; univar opxp03x_sum/gridl=0 gridu=10000 out=meps.kdecostbycluster; by _cluster_; run;

Figure 12 gives the text clusters with the corresponding text translation given in Table 5.

In order to compare outcomes by text clusters, we first need to know what datasets are defined. This is done by highlighting the connection between nodes (Figure 10). The resulting datasets are given in Figure 11.

Dataset stores cluster descriptions

Dataset stores original data plus cluster values.


12

Figure 12. Text Clusters for the NIS Data Sample

Table 5. Translation of Clusters in Figure 12 Cluster Number

Diagnoses

1 Feeding problems in newborn, unspecified fetal and neonatal jaundice, Other heavy-for-dates" infants, neonatal hypoglycemia, 33-34 weeks of gestation, cardiorespiratory distress syndrome of newborn, viral hepatitis, transitory tachypnea of newborn, other respiratory problems after birth, single liveborn born in the hospital of Cesarean delivery

2 Bipolar I disorder, most recent episode (or current) unspecified, anemia, alcohol abuse, extrinsic asthma unspecified, other convulsions, other and unspecified alcohol dependence, tobacco use disorder, single liveborn, unspecified hypothyroidism, other specified personal history presenting hazards to health

3 Unspecified essential hypertension, aortocoronary bypass status, coronary atherosclerosis of native coronary artery, esophageal reflux, urinary tract infection, site not specified, atrial fibrillation, chronic airway obstruction, not elsewhere classified, volume depletion, pure hypercholesterolemia, other and unspecified hyperlipidemia

4 Asthma, unspecified, hyposmolality and/or hyponatremia, other specified cardiac dysrhythmias, obesity, unspecified, osteoarthrosis, unspecified whether generalized or localized, morbid obesity, hypopotassemia, anxiety state, unspecified, chest pain, osteoporosis, unspecified

5 Abnormal glucose tolerance in pregnancy, abnormality in fetal heart rate or rhythm, elderly multigravida, other injury to pelvic organs, post term pregnancy, cord around neck, with compression, normal delivery, transient hypertension of pregnancy, breech presentation buttocks version, hypertension secondary to renal disease, complicating pregnancy, childbirth, and the puerperium

Note that cluster 1 contains codes primarily related to a newborn while cluster 5 contains codes primarily related to pregnancy and delivery. Because these two clusters are so dominated by specific patient conditions (because there are so many admissions for childbirth), it is perhaps better to define text clusters of patient conditions that use more specific DRG codes to define severity measures by major MDC category. We will demonstrate this approach in a later section. The two outcome measures considered are total charges and length of stay. We first show the relationship between the different measures with a series of table analyses (Tables 6-12). Table 6 compares the APRDRG mortality risk by defined text clusters. The APRDRG value of 0 indicates that no measure was calculated for these patients. Table 6. Comparison of APRDRG Risk of Mortality and Defined Text Clusters

Text CLUSTER (#) APRDRG Risk of Mortality

1 2 3 4 5 Total

0 23 37.70 0.28

13 21.31 0.15

17 27.87 0.06

6 9.84 0.03

2 3.28 0.03

61

1 8014 15.84 98.44

7497 14.81 84.87

12428 24.56 42.86

15553 30.73 68.85

7113 14.06 99.68

50605

There is clearly no real measure of agreement between these two risk measures. Also, for APRDRG, approximately two thirds of all patients were defined in group 1. For this reason, providers who can shift their patients from 1 to 2 by improved coding will receive more favorable quality rankings.


13

Text CLUSTER (#) APRDRG Risk of Mortality

1 2 3 4 5 Total

2 73 0.43 0.90

1039 6.09 11.76

10972 64.35 37.83

4947 29.01 21.90

20 0.12 0.28

17051

3 10 0.16 0.12

216 3.55 2.45

4155 68.37 14.33

1695 27.89 7.50

1 0.02 0.01

6077

4 21 1.10 0.26

68 3.57 0.77

1428 74.92 4.92

389 20.41 1.72

0 0.00 0.00

1906

Total 8141 8833 29000 22590 7136 75700 Table 7. Comparison of APRDRG Severity Rank by Defined Text Cluster

Text CLUSTER (#) APRDRG Severity 1 2 3 4 5

Total

0 23 37.70 0.28

13 21.31 0.15

17 27.87 0.06

6 9.84 0.03

2 3.28 0.03

61

1 6691 20.37 82.19

4105 12.49 46.47

7444 22.66 25.67

9481 28.86 41.97

5134 15.63 71.95

32855

2 1078 3.83 13.24

3811 13.53 43.15

1260344.73 43.46

8946 31.75 39.60

1739 6.17 24.37

28177

3 315 2.63 3.87

813 6.80 9.20

7030 58.76 24.24

3546 29.64 15.70

260 2.17 3.64

11964

4 34 1.29 0.42

91 3.44 1.03

1906 72.12 6.57

611 23.12 2.70

1 0.04 0.01

2643

Total 8141 8833 29000 22590 7136 75700 Table 8. Comparison of APRDRG Risk of Mortality to APRDRG Severity

APRDRG Risk of Mortality APRDRG Severity 0 1 2 3 4

Total

0 61 100.00 100.00

0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

61

1 0 0.00 0.00

31233 95.06 61.72

1538 4.68 9.02

67 0.20 1.10

17 0.05 0.89

32855

2 0 0.00 0.00

17423 61.83 34.43

9751 34.61 57.19

987 3.50 16.24

16 0.06 0.84

28177

3 0 0.00 0.00

1911 15.97 3.78

5600 46.81 32.84

4078 34.09 67.11

375 3.13 19.67

11964

4 0 0.00 0.00

38 1.44 0.08

162 6.13 0.95

945 35.75 15.55

1498 56.68 78.59

2643

Total 61 50605 17051 6077 1906 75700 Table 9. Comparison of Disease Staging: Mortality Level to Text Clusters

Text CLUSTER (#) Disease Staging: Mortality Level 1 2 3 4 5

Total

0 396 3.14 4.87

1649 13.07 18.72

927 7.34 3.20

3078 24.39 13.66

6571 52.06 92.08

12621

1 176 4.17

1237 29.28

849 20.09

1420 33.61

543 12.85

4225

Table 7 compares the APRDRG severity rank to text clusters; again, there is not a pattern of agreement. The APRDRG severity rank also concentrates most patients in the lower ranks; about 20% of all patients are in groups 3 and 4.

Table 8 compares the two APRDRG measures to each other. While there is major agreement for groups 1 and 4, there are few similarities for groups 2 and 3.

Table 9 examines the disease staging: mortality level to the text clusters. As with the previous tables, there is little agreement between the two.


14

Text CLUSTER (#) Disease Staging: Mortality Level 1 2 3 4 5

Total

2.17 14.04 2.93 6.30 7.61 2 668

0 27.89 82.19

2980 12.44 33.83

6858 28.63 23.69

7431 31.02 32.97

3 0.01 0.04

23952

3 802 3.25 9.87

2493 10.09 28.30

13568 54.93 46.86

7818 31.65 34.69

19 0.08 0.27

24700

4 45 0.56 0.55

392 4.84 4.45

5353 66.04 18.49

2316 28.57 10.28

0 0.00 0.00

8106

5 29 1.48 0.36

58 2.96 0.66

1397 71.28 4.83

476 24.29 2.11

0 0.00 0.00

1960

Total 8128

8809 28952 22539 7136 75564

Table 10. Comparison of Disease Staging: Resource Demand Level to Text Clusters

Text CLUSTER (#) Disease Staging: Resource Demand Level 1 2 3 4 5

Total

1 3717 99.73 45.74

0 0.00 0.00

9 0.24 0.03

1 0.03 0.00

0 0.00 0.00

3727

2 3487 24.41 42.91

2006 14.04 22.74

1532 10.72 5.29

2503 17.52 11.08

4758 33.31 66.68

14286

3 583 1.61 7.17

5403 14.91 61.24

1482940.92 51.16

1304636.00 57.77

2377 6.56 33.31

36238

4 229 1.33 2.82

1246 7.26 14.12

9852 57.43 33.99

5828 33.97 25.81

1 0.01 0.01

17156

5 110 2.59 1.35

168 3.95 1.90

2765 65.07 9.54

1206 28.38 5.34

0 0.00 0.00

4249

Total 8126 8823 28987 22584 7136 75656 Table 11. Comparison of Disease Staging: Mortality Level to Disease Staging: Demand Level

Disease Staging: Mortality Level Disease Staging: Resource Demand Level 0 1 2 3 4 5

Total

1 178 4.78 1.41

4 0.11 0.09

3542 95.04 14.79

3 0.08 0.01

0 0.00 0.00

0 0.00 0.00

3727

2 6363 44.56 50.42

1405 9.84 33.25

5045 35.33 21.07

1432 10.03 5.80

18 0.13 0.22

17 0.12 0.87

14280

3 5461 15.09 43.27

2639 7.29 62.46

11339 31.34 47.35

13830 38.22 55.99

2670 7.38 32.94

246 0.68 12.55

36185

4 607 3.54 4.81

170 0.99 4.02

3844 22.45 16.05

7638 44.60 30.92

3899 22.77 48.10

967 5.65 49.34

17125

5 12 0.28 0.10

7 0.16 0.17

179 4.22 0.75

1797 42.34 7.28

1519 35.79 18.74

730 17.20 37.24

4244

Total 12621 4225 23949 24700 8106 1960 75561

Table 10 compares the disease staging: resource level to the text clusters. In this case, there is some agreement with cluster 1 to resource level 1. Table 11 compares the two disease staging measures to each other. While there is some agreement between demand level 1 and mortality level 2, there is little agreement elsewhere between the two measures.


15

Table 12. Comparison of Disease Staging: Mortality Level to APRDRG Risk of Mortality APRDRG_Risk_Mortality Disease Staging:

Mortality Level 0 1 2 3 4 Total

0 7 0.06 29.17

12482 98.90 24.69

119 0.94 0.70

12 0.10 0.20

1 0.01 0.05

12621

1 0 0.00 0.00

4084 96.66 8.08

125 2.96 0.73

13 0.31 0.21

3 0.07 0.16

4225

2 8 0.03 33.33

21168 88.38 41.87

2489 10.39 14.62

260 1.09 4.28

27 0.11 1.42

23952

3 6 0.02 25.00

11910 48.22 23.56

10287 41.65 60.44

2217 8.98 36.54

280 1.13 14.75

24700

4 2 0.02 8.33

868 10.71 1.72

3695 45.58 21.71

2831 34.92 46.65

710 8.76 37.41

8106

5 1 0.05 4.17

43 2.19 0.09

304 15.51 1.79

735 37.50 12.11

877 44.74 46.21

1960

Total 24 50555 17019 6068 1898 75564 Table 13 compares the levels defined using weight of evidence to the five clusters defined by Text Miner. While there is variability, there is also some consistency. For example, 55% of WOE level 1 is in Text Cluster 4. Similarly, almost 100% of cluster 5 is in WOE level 6. It indicates that both methods provide better results to those of logistic regression. Table 13. WOE Levels Compared to Text Clusters

Text Cluster WOE Levels

0 1 2 3 4 5 6 7 8 9

Total

1 2 0.86 0.12

2 0.86 0.08

2 0.86 0.03

3 1.29 0.15

1 0.43 0.01

1 0.43 0.01

7 3.00 0.06

200 85.84 2.21

12 5.15 0.41

3 1.29 0.11

233

2 192 2.08

11.42

185 2.00 7.53

1917 20.73 25.27

941 10.18 45.81

900 9.73 4.86

962 10.40 9.06

1961 21.20 16.36

1144 12.37 12.66

598 6.36

19.98

458 4.95

16.40

9248

3 963 3.21

57.29

916 3.05

37.27

2761 9.20

36.39

551 1.84

26.83

13718 45.71 74.12

4849 16.16 45.68

661 2.20 5.52

3459 11.53 38.27

940 3.13

31.94

1192 3.97

42.68

30010

4 524 2/29

31.17

1355 5.93

55.13

2907 9.20

36.39

559 2.45

27.22

3890 17.02 21.02

4802 21.01 45.23

2040 8.93

17.02

4236 18.53 46.86

1403 6.14

47.67

1140 4.99

40.82

22856

5 0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

2 0.03 0.02

7314 99.97 61.04

0 0.00 0.00

0 0.00 0.00

0 0.00 0.00

7316

Table 12 compares the two measures of mortality, APRDRG and Disease Staging. There is little agreement here as well.


16

Figure 13. Kernel Density Estimators of inpatient Costs by Text Cluster

From the shape of the graphs, Cluster 5<2<1<4<3 in terms of ordering. There are two cutpoints in the figure. The first occurs at $11,000 where clusters 1, and 5 transition to lower probability compared to clusters 3 and 4; the second occurs at $16,000 where cluster 2 transitions to a lower cost. Figure 14 examines the APRDRG severity measure by inpatient cost. As should happen, group 1<2<3<4 with group 4 having considerable variability. The major cutpoint occurs at $12,000. Figure 14. Kernel Density Estimate of APRDRG Severity by Inpatient Cost

Figure 15 examines the relationship of inpatient costs to the APRDRG mortality risk measure. Group 1<2<3<4 in terms of costs with groups 3 and 4 having considerable variability compared to groups 1 and 2. The cutpoint between groups 1 and 2 occurs at $10,000; the cutpoint between groups 3 and 4 occurs at $55,000.

We next examine the kernel density estimators of the outcomes. Figure 13 gives the relationship of text cluster to inpatient costs.


17

Figure 15. Kernel Density Estimate of Inpatient Costs by APRDRG Mortality Risk

Figure 16 gives the estimates using the disease staging: resource demand level measure. Again, group 1<2<3<4<5. However, the graph for group 1 is extremely narrow, with very small variability and almost zero probability of costs higher than $5000. The probability of cost for any group beyond $25,000 is also almost zero. Thus while the groups separate well, they seem to be poor predictors of actual costs. Figure 16. Kernel Density Estimates of Inpatient Costs by Resource Demand Level

Figure 17 examines disease staging: mortality level. Again, it can only predict inpatient costs up to about $35,000 with positive probability, indicating that it is a poor predictor of actual patient costs. It has multiple cutpoints with $12,000 for groups 1 and 2; $10,000 for groups 2 and 3 and $14,000 for groups 1 and 3. For that interval, 1<3<2. The cutpoint for groups 3 and 4 occurs at $25,000. Because of this shift, with 1<3<2, there is a strong probability that


18

some of the hospitals in the sample are shifting patients into a higher group through over-coding of secondary patient conditions. Figure 17. Kernel Density Estimate of Disease Staging: Mortality Level

Figure18. Kernel Density Estimate of LOS by Text Cluster

We also look at the length of stay (LOS) as an outcome variable by the different patient severity measures. Figure 18 gives the kernel density estimate for the LOS by the text clusters. The pattern is very regular with cluster 5<1<2<4<5. Clusters 2 and 4 are almost identical in terms of distribution. The cutpoint occurs at day 4+ with cluster 5 having a very low probability of exceeding 4 days inpatient stay. The text clusters have positive probability out to day 12.


19

Figure19 gives the estimate of LOS by APRDRG severity measure. It has positive probability out to day 20. However, between days 4 and 5, the ordering becomes 1<3<2, indicating a shift of patients from class 2 to class 3 caused by over-coding of some hospitals. Figure 19. Kernel Density Estimate of LOS by APRDRG Severity Measure

Figure 20 shows the comparison of LOS to APRDRG mortality index. The pattern looks almost the same as in Figure 19, with 1<3<2 for values between 4 and 6.5 days. Figure 20. Kernel Density Estimate of LOS for APRDRG Mortality Index

Figure 21 shows the resource demand level with respect to LOS. For the most part, the LOS for patients in group 1 is equal to 2 days with much smaller probabilities of 1 and 3 days. Group 2 also peaks at 2 days LOS with just a slightly higher probability of 4 or more days compared to group 1. These two curves strongly suggest shifting


20

patients from group 1 to group 2. Groups 3,4,5 behave as expected for this estimate. Also, the positive probability ends at about day 10 compared to day 12 or day 20 for the previous groups. Figure 21. Kernel Density Estimate of LOS for Disease Staging: Resource Demand Level

In contrast, as shown in Figure 22, the Disease Staging: Mortality Level has a much more regular pattern with 1<2<3<4<5. Figure 22. Kernel Density Estimate of LOS for Disease Staging: Mortality Level


21

CONCLUSION It is possible to develop a model to rank the quality of care so that the model does not assume uniformity of data entry. The model can also be validated by examination of additional outcome values in the data. The means of developing the model is to use the stemming properties of the ICD9 codes where the first three digits of the code represent the primary category while the remaining two digits represent a refinement of the diagnosis. The model compares well to those developed through the standard logistic regression technique. The text clusters that are defined can be used to validate the cluster levels.

REFERENCES 1. Anonymous. AHRQ Quality Indicators: Patient Safety Indicators (PSI) Composite Measure: AHRQ; 2006:34

Pages. 2. Anonymous. STS evidence based guidelines: cardiac surgery risk models; 2004. 3. Coulter SL, Cecil WT. Risk adjustment for in-patient mortality-coronary artery bypass surgery. Nashville,

TN: Tennessee Department of Health; February, 2004 2004. 4. Gilligan R, Gilanelli D, Hughes R, et al. Coronary artery bypass graft surgery in New Jersey, 1944-1995.

New Jersey: Cariovascular health advisory panel; November, 1997 1997. 5. Krakauer H, Bailey RC, Skellan KJ, et al. Evaluation of the HCFA model for the analysis of mortality

following hospitalization. Health Services Research. 1992;27(3):317-335. 6. Iezzoni LI, Ash AS, Shwartz M, Daley J, Hughes JS, Mackleman YD. Predicting who dies depends on how

severity is measured: implications for evaluating patient outcomes. Annals of Internal Medicine. 1995;123(10):763-770.

7. Austin PC, Alter DA, Tu JV. The use of fixed- and random-effects models for classifying hospitals as mortality outliers: a monte carlos assessment. Medical Decision Making. 2003;23:526-539.

8. Poses RM, McClish DK, Smith WR, et al. Results of report cards for patients with congestive heart failure depend on the method used to adjust for severity. Annals of Internal Medicine. 2000;133:10-20.

9. Thomas JW. Research evidence on the validity of adjusted mortality rate as a measure of hospital quality of care. Medical Care Research and Review. 1998;55(4):371-404.

10. Flanders DW, Tucker G, Krishnadasan A, Honig E, McClellan WM. Validation of the pneumonia severity index: importance of study-specific recalibration. Journal of General Internal Medicine. 1999;14(6):333-340.

11. Smith EP, Lipkovich I, Ye K. Weight of Evidence (WOE): Quantitative estimation of probability of impact. Blacksburg, VA: Virginia Tech, Department of Statistics; 2002.

12. Anonymous. Advanced Predictive Modeling Using SAS Enterprise Miner 5.1 Course Notes. Cary, NC: SAS Education; 200.

ACKNOWLEDGMENTS I appreciate the valuable input from my co-investigator and domain expert, John C. Cerrito, PharmD. Also, the work in this paper was supported by NIH grant # 1R15RR017285-01A1, Data Mining to Enhance Medical Research of Clincal Data

CONTACT INFORMATION Author Name Patricia B. Cerrito Company University of Louisville Address Department of Mathematics City state ZIP Louisville, KY 40292 Work Phone: 502-852-6010 Fax: 502-852-7132 Email: [email protected]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.