26
Modeling and Analytics in a UBI Setting CAS Ratemaking and Product Management (RPM) Seminar March 16, 2015 Orlando, FL

Modeling and Analytics in a UBI Setting

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modeling and Analytics in a UBI Setting

Modeling and Analytics in a UBI Setting

CAS Ratemaking and Product Management (RPM) SeminarMarch 16, 2015 ● Orlando, FL

Page 2: Modeling and Analytics in a UBI Setting

Coming soon

Volume II: Case Studies in InsuranceChapter 11: Predictive Modeling for Usage-Based Insurance (Makov, Weiss)

2

Page 3: Modeling and Analytics in a UBI Setting

Is UBI data ‘big’?Several observations per second

Large number of collected

metricsC

ontin

uous

ly re

fresh

ing

dat

aset

Relationships to other dynamic databases3

Presenter
Presentation Notes
http://www.istockphoto.com/vector/car-front-modern-technology-royalty-free-vector-art-pattern-gm465063426-59547116?st=46d77bd
Page 4: Modeling and Analytics in a UBI Setting

Potential data sources

4

Presenter
Presentation Notes
http://www.istockphoto.com/photo/drive-phone-mockup-isolated-gm484675588-71392133?st=cd63ad9 http://www.istockphoto.com/photo/smart-touch-screen-multimedia-system-for-automobile-gm533260643-54012010?st=b36a8be http://www.shutterstock.com/pic-197191379/stock-photo-image-of-car-diagnostic-tool.html?src=pp-photo-200843888-6&ws=1
Page 5: Modeling and Analytics in a UBI Setting

Little big data

Data Element Data Value

Vehicle Identification Number 1234567890ABCDEFG

Date 11/01/2015

Time (UTC) 02:06:34

Global Positioning System Latitude 41.773312800000000000

Global Positioning System Longitude -87.715253500000020000

Avg. Speed (MPH, since prior obs.) 21.30

Accelerometer Axis X Readings (g-force) { 0.00, 0.11, 0.21, 0.11, 0.05 }

Accelerometer Axis Y Readings (g-force) { -0.21, -0.23, -0.26, -0.14, -0.01 }

Accelerometer Axis Z Readings (g-force) { -1.00, -1.00, -0.99, -0.98, -0.99 }

Odometer (miles) 18,246

Sample telematics data point:

5

Example is purely illustrative and not intended to suggest, for instance, that 5 Hz is optimal accelerometer sample rate.

Page 6: Modeling and Analytics in a UBI Setting

Technology considerations• Orientation – how do we determine what’s

forward, backward, up and down?

• Calibration – how do we avoid hills, etc. registering as backward g-force?

• Device movements – how do we stop jostling the hardware from registering events?

• Event identification – how do we ensure different technologies record events in the same manner?

6

Page 7: Modeling and Analytics in a UBI Setting

Give me a brake

If we define ‘harsh braking’ as 0.4 (backward) g-force exceedance, then these three braking events would be treated similarly

7

Presenter
Presentation Notes
http://
Page 8: Modeling and Analytics in a UBI Setting

Modeling challenges• Complexity – how do we transform 0s and 1s into

something more predictive for insurance?

• Depth –100,000s of rows per risk … how do we compress with minimal loss of predictive power?

• Dimensionality – what do we do when the number of columns is in tens of thousands?

• Overlap – some classes are riskier … how do we avoid double-counting known effects?

8

Page 9: Modeling and Analytics in a UBI Setting

Examples of context

• Time of day (telematics data feed)• Speed vehicle traveling (telematics data feed)• Visibility and traction (weather database)• Number of lanes (road atlas database)• Speed vehicle supposed to be traveling (traffic database)

9

Presenter
Presentation Notes
http://www.istockphoto.com/photo/motorway-with-rush-hour-traffic-gm482754208-70569471?st=3c697ef http://www.istockphoto.com/photo/view-from-inside-car-while-driving-night-snowstorm-gm508353272-85224855
Page 10: Modeling and Analytics in a UBI Setting

How to create ~10,000 variables in seconds

• As a first step, determine true vs. false– Exceedance of thresholds (…0.39, 0.4, 0.41…)– Presence of various condition sets e.g.

• Morning rush hour?• Four lane road?• Visibility < 1 mile?• Traveling between 46-50 MPH?

• Sum exceedance counts and exposure over every possible condition set

• Aggregate exposure/counts to vehicle level • Determine incidence rates of exceedance

(per exposure unit) for each condition set

10

Page 11: Modeling and Analytics in a UBI Setting

Variable selection approach

• Group variables thematically• Software (HPGENSELECT, ‘step’)• Staged stepwise 535 candidates• Final stepwise 57 variables

11

Presenter
Presentation Notes
http://www.istockphoto.com/photo/toy-car-made-of-pliable-wire-against-white-background-gm483959584-71307449?st=848fc9a
Page 12: Modeling and Analytics in a UBI Setting

Poisson model

E (claims) = exp { 𝜃𝜃 + ∑ 𝛼𝛼{𝑗𝑗}𝐷𝐷𝐷𝐷𝐷𝐷1{𝑗𝑗}𝐽𝐽

𝑗𝑗=1 + ∑ 𝛽𝛽{𝑘𝑘}𝐷𝐷𝐷𝐷𝐷𝐷2{𝑘𝑘}𝐾𝐾𝑘𝑘=1 +

∑ 𝜗𝜗{𝑚𝑚}𝐷𝐷𝐷𝐷𝐷𝐷3{𝑚𝑚}𝑀𝑀𝑚𝑚=1 + ∑ 𝜑𝜑{𝑛𝑛}𝐷𝐷𝐷𝐷𝐷𝐷4{𝑛𝑛}𝑁𝑁

𝑛𝑛=1 }

where …θ: intercept termDDVs1-4{j-n}:

1-4 : thematic groupings i.e. braking, cornering, etc.j-n: individual DDVs i.e. incidence rates for condition sets

J,K,L,M: number of significant variables in familyDDV1-4{i}: normalized incidence rate of ith variable in DDV familyα(i), β(i), etc.: coefficient applicable to DDV1-4{i}

12

Page 13: Modeling and Analytics in a UBI Setting

Overlap possibilities• Driver classification• Accidents and violations• Credit history• Territory• Annual mileage

13

Presenter
Presentation Notes
http://www.istockphoto.com/vector/roads-view-from-above-transportation-drawing-gm534896509-57173352?st=3570830
Page 14: Modeling and Analytics in a UBI Setting

Approaches to overlap

Possibilities include:• Assume independence of UBI v. trad’l• Use existing variables as offsets/control• Separate models by class• Holistic model / machine learning• UBI only approach

14

Page 15: Modeling and Analytics in a UBI Setting

Loss ratio ‘lift chart’

Analysis performed using same vehicles used to train model, but separate period of 90 driving days to produce estimates. Chart suggests ‘All other things

being equal,’ model identifies one in five that are >10x as risky. 15

Page 16: Modeling and Analytics in a UBI Setting

ROC and ‘area under curve’

• Thresholds for binary classification• True vs. false positives and negatives• FPR = FP ÷ (FP + TN)• TPR = TP ÷ (TP + FN)• AUC = 0.62

16

Graph and AUC relate to target variable of claims from holdout sample.

Page 17: Modeling and Analytics in a UBI Setting

Alternative models

Variable Selection Model FormStepwise* Poisson Regression*Tree TreeTree Poisson RegressionTree then Stepwise Poisson RegressionStepwise, Tree, Stepwise Tree

17

We tested whether the following alternative approaches to variable selection and modeling produced more parsimonious results

* - previously outlined approach

Page 18: Modeling and Analytics in a UBI Setting

Data properties that may warrant using decision trees

• Dataset described by fixed set of attributes• Target function has discrete set of values• ‘Disjunctive descriptions’ potentially required• Noisy training data (sparse or variant)

18

Presenter
Presentation Notes
http://www.istockphoto.com/photo/toy-cars-gm471113359-7334430?st=97d8d6d
Page 19: Modeling and Analytics in a UBI Setting

Classification trees for UBIDDV_4_20

DDV_6_11 DDV_4_18

DDV_4_17 DDV_3_3

DDV_4_20DDV_3_24

DDV_6_11

DDV_1_25

DDV_1_14

DDV_6_4

DDV_4_17

< 2.539 >= 2.539

< 16.78 >= 16.78

< 16.56 >= 16.56

>= 11.95< 11.95

< 13.14>= 13.14

< 12.9>= 12.9

>= 14.47< 14.47

< 22.16>= 22.16

< 8.625>= 8.625

>= 5.375< 5.375

19>= 11.15< 11.15

< 22.13 >= 22.13

Page 20: Modeling and Analytics in a UBI Setting

Revisiting variable selection

20

56%

60%

64%

68%

Stepwise --> GLM Tree Tree --> GLM Tree --> SW -->GLM

SW --> Tree -->SW --> Tree

Area Under Curve for Different Approaches

57DDVs

23 DDVs

11 DDVs

8DDVs

24DDVs

Page 21: Modeling and Analytics in a UBI Setting

Combining trees, GLMs may yield strongest UBI results

21

0

0.5

1

1.5

2

2.5

Stepwise -->GLM

Tree Tree --> GLM Tree --> SW -->GLM

SW --> Tree -->SW --> Tree

Tertile Loss Ratio Indices for Different Approaches

3.35x 2.40x 5.55x 1.96x 7.09x

57DDVs

23 DDVs

11 DDVs

8DDVs

24DDVs

Page 22: Modeling and Analytics in a UBI Setting

Back with the old

22

0

0.5

1

1.5

2

2.5

3

UBI only Trad'l Only UBI + Trad'l

Tertile Loss Ratio Indices for Different Approaches

3.35x 1.22x 6.29x

Poor performance of traditional predictors may result in large part from relatively small data volumes.

Page 23: Modeling and Analytics in a UBI Setting

Regulatory considerations

• Familiarity of approach• Discounts vs. surcharges• Confidentiality• Observation period• Support and policyholder challenges• Privacy

23

Presenter
Presentation Notes
http://www.istockphoto.com/photo/creative-toy-car-gm175193212-21935805?st=7cc4567
Page 24: Modeling and Analytics in a UBI Setting

Observation period

Considerations:• Stability• Predictive power• Technology deployment• Renewal management• Behavioral modification

24

Presenter
Presentation Notes
http://www.istockphoto.com/photo/little-boy-with-a-toy-truck-gm141242964-18890384?st=19c5523
Page 25: Modeling and Analytics in a UBI Setting

Areas for future exploration

• Pay per mile, trip, etc.• Evolving data collection options• Commercial lines / heavy trucks• Changing ownership patterns• Autonomous vehicles

25

Presenter
Presentation Notes
http://www.istockphoto.com/photo/impossible-bottle-ship-in-a-bottle-gm495735162-78138315?st=156091e