23
FATAL OR INJURY- A CASE OF DECIDING ON PRIORITIZING RESPONDER RESOURCES By Piyush Lohana

Data Mining Project-Predicting Injury or Fatality in case of an accident

  • Upload
    l-p

  • View
    197

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Data Mining Project-Predicting Injury or Fatality in case of an accident

FATAL OR INJURY- A CASE OF DECIDING ON PRIORITIZING RESPONDER RESOURCES

ByPiyush Lohana

Page 2: Data Mining Project-Predicting Injury or Fatality in case of an accident

Maximum accidents in the year 2007 happened due to motor vehicles.

Page 3: Data Mining Project-Predicting Injury or Fatality in case of an accident

WHY THIS PROJECT

• “Every 12 minutes someone dies in a car crash in the United States due to a car accident or a collision between two motor vehicles.” (-NCIPC)

• Most of times the accidents are fatal or involve serious injuries and by the time the help arrives at the crash site, a lot of loss has been done.  

• We attempt to build a model that can predict the seriousness of an accident case (i.e. if an accident is fatal or results in injury) based on the various predictors like rush or no rush hour, work zone, weather conditions, speed limits, interstate etc.

• This helps to prioritize situations and allocates resources in scenarios where there is a high possibility of an accident resulting in fatalities or serious injury.

• This will enable the emergency care provider on focusing on the measures and resource that can be taken when they arrive at the scene. The accuracy of pre-hospital crash scene details and crash victim assessment has important implications on the care that can be provided at the time of the crash scene.

Page 4: Data Mining Project-Predicting Injury or Fatality in case of an accident

WHAT ARE WE CONSIDERING

• We will be looking at the characteristics of the environment in which the accident occurred (weather, road condition, type of road, time of day, the day of the week, and month of the year) and the characteristics of the crash (direction of accident, speed limit on the road, work zone area, and how many vehicles were involved).

• All of these variables can effect in what kind of accident has occurred (no injury, injury or fatal). This can further help the medic’s team to come prepared for the necessary actions that need to be taken at the scene.  

Page 5: Data Mining Project-Predicting Injury or Fatality in case of an accident

DATA SOURCE

• http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=1158

• It has 24 different attributes and 42,183 records

• Identified Predictor and Outcome Variables

Page 6: Data Mining Project-Predicting Injury or Fatality in case of an accident

CLEAR DESCRIPTION OF DATA SETSl. No Variables Description

1 HOUR_I_R 1=rush hour, 0=not (rush = 6-9 am, 4-7 pm)2 ALIGN_I 1 = straight, 2 = curve3

STRATUM_R

1= NASS Crashes Involving At Least One Passenger Vehicle towed due to damage from the crash scene and no medium or heavy trucks are Involved, 0=not

4 WRK_ZONE 1= yes, 0= no5 WKDY_I_R 1=weekday, 0=weekend6 INT_HWY Interstate? 1=yes, 0=no7

LGTCON_I_RLight conditions - 1=day, 2=dark (including dawn/dusk), 3=dark, but lighted,4=dawn or dusk

8 MAN_COL_I 0=no collision, 1=head-on, 2=other form of collision9 PED_ACC_R 1=pedestrian/cyclist involved, 0=not10

REL_JCT_I_R1=accident at intersection/interchange, 0=not at intersection

Page 7: Data Mining Project-Predicting Injury or Fatality in case of an accident

CLEAR DESCRIPTION OF DATA SETSl. No Variables Description

11 SPD_LIM Speed limit, miles per hour 12

SUR_CONSurface conditions (1=dry, 2=wet, 3=snow/slush, 4=ice,

5=sand/dirt/oil, 8=other, 9=unknown)13 TRAF_WAY 1=two-way traffic, 2=divided hwy, 3=one-way road14 VEH_INVL Number of vehicles involved15

WEATHER_R1=no adverse conditions, 2= rain, snow or other adverse

condition16 INJURY_CRASH 1=yes, 0= no17 NO_INJ_I Number of injuries18 FATALITIES 1= yes, 0= no19 MAX_SEV_IR 0=no injury, 1=non-fatal inj., 2=fatal inj.

Page 8: Data Mining Project-Predicting Injury or Fatality in case of an accident

FILTERING DATA

• Filtering method used is "Standard Deviations from the Mean",

• This will eliminate the observations that are farther than three standard deviations from their means.

Page 9: Data Mining Project-Predicting Injury or Fatality in case of an accident

DATA PARTITIONING

• We build the model with Training Data• Test its correctness with Test Data• Validate it with Validation Data

Page 10: Data Mining Project-Predicting Injury or Fatality in case of an accident

PREDICT, CLASSIFY OR CLUSTER ?

As we are trying to predict the categorical class label MAX_SER_INJ, our analysis is supervised classification.

Our model intends to discover relationships between the attributes that would make it possible to predict the outcome variable.

Page 11: Data Mining Project-Predicting Injury or Fatality in case of an accident

MODELThe following three models are used for our analysis

• Memory Based Reasoning(MBR)

• Decision Trees

• Logistic Regression

Page 12: Data Mining Project-Predicting Injury or Fatality in case of an accident

FINAL MODEL

Page 13: Data Mining Project-Predicting Injury or Fatality in case of an accident

RESULTS AND DISCUSSION

Page 14: Data Mining Project-Predicting Injury or Fatality in case of an accident

BASELINE MISCLASSIFICATION• MAX_SEV_IR - 0=no injury, 1=non-fatal inj., 2=fatal inj.

• Class 0 (No injury): 4949

• Class 1(Non-fatal injury): 4900

• Class 2 (Fatal Injury): 150

• The majority class is 0 (No injury)

• The percentage of majority class in the dataset is: 49.49 % (4949/9999)

• The baseline misclassification rate: 50.51 %• This is the baseline, the model that we build will make any sense if its

misclassification rate is less than baseline misclassification.

Page 15: Data Mining Project-Predicting Injury or Fatality in case of an accident

OUR DEFINITION OF BEST MODEL AS PER BUSINESS REQUIREMENT

• Decision Tree : A supervised learning data driven method for classification

• It is based on separating observations into more homogeneous subgroups by creating splits on predictors.

• As Per our business requirement , this model is best in classifying the event of accident into three cases to prioritize resources.

Page 16: Data Mining Project-Predicting Injury or Fatality in case of an accident

RESULTS

The _MISC_ Misclassification rate :

• Training: 0.40945

• Validation: 0.4113

• Test: 0.42305

Page 17: Data Mining Project-Predicting Injury or Fatality in case of an accident
Page 18: Data Mining Project-Predicting Injury or Fatality in case of an accident

NODE RULES

Page 19: Data Mining Project-Predicting Injury or Fatality in case of an accident

INTERPRETATION AND IMPLEMENTATION

• Based on this rules, an application/website can be created which upon entering all the 5 most important factors(Predictors) will give an idea of the percentage of chances of an accident resulting in Fatality/Injury/No Injury.

• The emergency service provider can then take a decision and send the response team to the site of an accident accordingly.

Page 20: Data Mining Project-Predicting Injury or Fatality in case of an accident

BLUE PRINT OF IMPLEMENTATION

Page 21: Data Mining Project-Predicting Injury or Fatality in case of an accident

OUTCOME

• Depending on the Node Rule, it will predict the outcome

• Red Cross predict’s there are 80% chances of Injury

• Red Cross predict’s there are 10 % chances of Fatality

• Red Cross predict’s there are 10 % chances of No injury

Page 22: Data Mining Project-Predicting Injury or Fatality in case of an accident

SCOPE FOR IMPROVEMENT

• In order to build more focused and rigorous model, we are working on identifying more predictors that can help determine the status of accident and a more clean model that has a less misclassification.

• In order to achieve this, we intend to try Neural Network data mining algorithm.

Page 23: Data Mining Project-Predicting Injury or Fatality in case of an accident

THANK YOU