Big Data and Insurance Industry - SOA · PDF fileSession 3 – Big Data and Insurance Industry Prof. Jack Ching‐Syang Yue, ASA

Session 3 – Big Data and Insurance Industry

Prof. Jack Ching‐Syang Yue, ASA

6/30/2015

1

2015/06/25

2

Big Data and Insurance Industry

Speaker: Jack C. Yue

National Chengchi Univ.

Date: June 25, 2015

Email: [email protected]

6/30/2015

2

Summary

What is Big Data？

Attributes of Big Data

Limitations of Big Data

Big Data and Insurance

Future of Big Data

3

What is Big Data？

The term “Big Data” was first proposed by IBM in 2010, and it involves three V’s:

Volume: size at least TB or PB

Variety: including visual and GIS data

Velocity: real time (instant) analysis

Note: According to IDC (International Data Corporation), the capacity of data storage grows 50% annually.

4

6/30/2015

3

5

The Growing Size of a Computer File!

6

6/30/2015

4

7

8

Information Explosion！

Source：The Expanding Digital Universe, A Forecast of Worldwide Information Growth Through 2010,March 2007, An IDC White Paper - sponsored by EMChttp://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

IDC predicted in 2007 the storage will grow sixfold: 2006 161 EB2010 988 EB (Pred.)Note: International Data Corporation (IDC)

6/30/2015

5

9

Source：The Diverse and Exploding Digital Universe, An Updated Forecast of Worldwide Information Growth Through 2011March 2008, An IDC White Paper - sponsored by EMChttp://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf

IDC updated (in 2009) the prediction to tenfold from 2006 to 2011:2006 161 EB

2007 281 EB2010 988 EB (pred.)2011 1773 EB (pred.)

The Explosion Starting in 2007 …

The Era of Big Data！ Data accumulation speed is much faster… 10+ years (1990-2004) to complete the Human

Genome Project, sequencing 3 billion letters (base pairs), and now it takes only one day.

More than 70% of the stock shares traded in 2010 (NYSE + NASDAQ) are generated from Automatic Trade System (or Statistical Trading System). (30,000 trades in NYSE every second).

Facebook has 10 million plus pictures uploaded every hour and more than 3 billion “like” everyday. (User data size 300 PB) 10

6/30/2015

6

11


http://www.abc.net.au/reslib/201204/r926508_9670000.jpg

Value of Information In the late 1990’s, Amazon hired more than 10

book critics to give recommended lists. But, the sales of books from readers’ suggestions are much better. (At least 1/3 sales are from the automatic system.)

Wal-mart used historical data in 2004, and found that the sales of flash light and Pop-Tarts increase before the hurricane. (Note: Diaper and Beer is another well-known example.)

12

6/30/2015

7

About Wal-Mart

Wal-mart was one of the first companies to collect and analyze customers’ data, and use them to increase sales.

During the weekend, people who purchase diapers tend to purchase beer as well.

Question: Why are they purchased together and how can use this information to stimulate the sales?

Adding Value to the Data

A famous example in Data Mining:＄＄＄

Strategies of marketing and warehouse

Customer1Customer2 Customer3

Milk, eggs, sugar, bread

Milk, eggs, cereal, bread

Eggs, sugar

6/30/2015

8

Value of “Diaper and Beer”

“Diaper and beer” is a good example of Association, instead of Causality.

Association can create value from the data, not necessarily via finding the causality. In the case of diaper and beer, we can plan:

Pricing and marketing;

Goods layout and floor design:

Warehouse and Inventory.


In addition to the size, big data also have the following attributes:

Sample = PopulationNon-uniform Data with NoiseAssociation vs. Causality

http://mmdays.com/wp-content/uploads/2013/03/Bigdata.jpg

6/30/2015

9

“Sample = Population” (n = all?)

We can use the whole population and don’t need to infer from sample data. (Bias?)

More updated information (c.f. Census);Remove possible sampling bias;Prevent insufficient sample. But we need to consider the problemsof storage, updating, and analysis.

http://www.smartercomputingblog.com/wp-content/uploads/2012/06/where-is-your-data.png

Sample Representative

6/30/2015

10

19

Sampling Bias!!

Non-uniform Data with Noise

In addition to its voluminous and non-uniform nature, big data tend to have “noises.”

Sources and formats create the problem of data compatibility. (Meta Analysis!)

Need to modify in-homogeneity. Volume is more important (c.f. quality)

CPI official statistics vs. Commodity prices from web (Deflation after Lehman Brothers' bankruptcy)

http://blog.backupify.com/2013/04/01/the-5-but-really-the-6-mistakes-made-in-big-data/00009-08_sept253-jpg/

6/30/2015

11

Association vs. Causality

Quantitative analysis alone cannot decide the causality, and there may exist latent variables. (e.g., foot size and spelling ability of grade school students)

Complete data and powerful tools make finding association possible (pre-caution).

UPS started Preventive Maintenance in 2000.New York City used the manhole records to

predict (defective) explosions. (2010 Wired)

22Photographer: David Paul Morris/Bloomberg

United Parcel Service (UPS) was one of the earliest adopters of business analytics in Big Data.

6/30/2015

12

23

Limitations of Big Data

http://www.tnooz.com/2012/01/04/how-to/big-data-and-the-infinite-possibilities-for-the-travel-industry/

Statistics and Knowledge Statistics (& quantitative tools) can be

classified as Induction, and help us to identify:

Regular

Extreme

Irregular

6/30/2015

13

Induction

Deduction

Information or Junk Explosion

Information is everywhere but what are the really important factors?

What information is essential?

Data quality? (Garbage in, garbage out!)

How the information is used for decision? (e.g., avoid high risk decisions, reduce the possible risk.)

6/30/2015

14

Data

Information

Fact

Knowledge

Trial and Error The size of big data makes it difficult to

create SOP for analysis, and the traditional Trial-and-error can be used.

Data scientists at Kaggle found that the best bet is an orange used car. (odd color vs. self-expression of car owner??)

In the prediction of NYC defective manholes (using 2008 to predict 2009) , the correction rate is 44%. (Key factor: year cable made)

6/30/2015

15

29

The data usually are collected via:1. Experimental Design2. Observational Study

The major difference is if the collection would influence the outcomes. (e.g., Observing the stock and housing markets usually won’t distort the results.)

Note: We shall focus on “association.”

About Data Collection

6/30/2015

16

In pricing insurance products, we always look for risk factors.

Age and gender are two well-known factors, and theories are proposed for explanation.

Marriage is also a potential factor, and possible reasons are selection-at-marriage, responsibility, living arrangement (& reciprocal care giving), and social interaction.

Causality is not obvious!

32Taiwan’s Age-specific Mortality Rates (2009-11)

6/30/2015

17

33Taiwan’s Marital Mortality Rates (Female)

Age

log

(qx)

15-19 30-34 45-49 60-64 75-79 90-94

-8-6

-4-2

1994-96 Married1994-96 Single1994-96 Divorced

Age

log

(qx)

15-19 30-34 45-49 60-64 75-79 90-94-8

-6-4

-2

2009-11 Married2009-11 Single2009-11 Divorced

34

Big Data and Insurance

http://h30507.www3.hp.com/t5/Journey-through-Enterprise-IT/Analyze-This-Big-Data-is-insurance-against-losing-a-competitive/ba-p/143577#.UgZmpLQVEqQ

6/30/2015

18

Big Data and Insurance Industry In 2009, Google analyzed keywords used in

the search engine (3 billion records daily). Comparing to the 2003~08 records in CDC, Google can detect the spread of H1N1 at least one week earlier. (User feedback!)

John Snow (1854) studied the under-water system of London and found thatthe spread of Cholera is related topolluted water. (Spatial Statistics)

New Territory of Insurance Industry?

Big Data is insurance against losing a competitive edge…..

Insurance companies face more diverse risk, in addition to interest risk. Living longer and more information create new possibilities and new risks. (e.g., longevity risk & moral hazard)

Note: We shall use the health insurance products as a demonstration.

6/30/2015

19

Insurance related Big Database

In addition to the experienced data from individual insurance companies, public data are also available:

Mortality Study: Human Mortality Database (HMD) and Ministry of Interior (Taiwan).

Health Data: National Health Insurance (Taiwan), Society of Actuaries (SOA), and United States Renal Data System (USRDS).

Taiwan’s National Health Insurance

Taiwan started the national health insurance (NHI) in 1995, and more than 99% population are covered (excluding oversea works).

Waiver of copayment is one of the important policy in NHI. In addition to veterans, pregnant women, and people in remote areas, Catastrophic Illness (CI) patients also enjoy the copayment waiver.

CI patients (4% population) spend 30% of total cost in 2014. 38

6/30/2015

20

Handling Big Data

The size and quality of NHI database make data analysis difficult.

Need to rely on database software and data scientists (e.g., IT experts).Data cleaning is a big issue, especially the health care data are from different hospitals. Data Discrepancy?The death records are not complete in NHI database, and many are even wrong!

39

40

6/30/2015

21

It is difficult to handle the big data using regular software and the database software (e.g., SQL) is required.

Data cleaning and exploratory data analysis (EDA) are the key to success.

For example, more than one databases are available and there exist discrepancy.

Note: ID Incidence; HV_CD Mortality 41

Cleaning the Data

42

The Future of Big Data

http://cdn.marketingtechblog.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-28-at-11.22.05-AM.png

6/30/2015

22

Things to be Considered……

To incorporate with big data, the insurance industry needs to consider:

Obtaining and updating data: confidential issue in a company; data sharing between companies (property right)

Maintenance and usage: Safety and privacy (data cloud?); Institutional Review Board

Data and meta analysis: Industry-academic cooperation; R&D

The purpose of IRB review is to assure, both inadvance and by periodic review, that appropriatesteps are taken to protect the rights and welfare ofhumans participating as subjects in the research.

IRBs use a group process to review researchprotocols and related materials (e.g., informedconsent documents and investigator brochures) toensure protection of the rights and welfare of humansubjects of research.

Human Protection in Human ResearchWhat, Why, How, When and Where

6/30/2015

23

Suggestions from Other Users

Barry Ralston (Assistant VP of Data Management

at Infinity Insurance) suggests: Analyze data that matters. As you store, so you retrieve. Improve performance at the origin. Time-to-decision matters. Get ahead of the game.Note: Ralston says “Data is key to what we do.”

45

My Suggestions for using Big Data

In addition to actuaries, insurance companies also need experts in big data (& information technology), such as data scientist/statistician.

Data are an important asset, and regulating the data trading would become necessary.

Let the users of big data bear the burden of privacy issue.

46

6/30/2015

24

47

Big Data = Opportunity?

The profitability is always a issue. (Use the sale of LTC as an example.)

48

LTC Insurance in U.S.A.

The main reasons why the sales of LTC insurance in U.S.A. are not good: Low Consumer Demand, Pricing, and

Managing the Risk.The LTC insurance is associated to long term

risk and not easy to handle ALM: Long-tail Liability Risk;Cash Flow (Asset Management)Note: Low interest rate and low withdrawal rate

would increase the premiums. (10%~40%).

6/30/2015

25

49

Factors Influencing LTC Sales

Asset Investment Strategiese.g., Convertible Bonds, Derivatives, Collateralized

Loan Obligations, Private PlacementsLiability Risk Transfere.g., Claim Securization, Commission Securization,

Offshore Reinsurance, Product RedesignRegulatory Impacts on LTC Risk Managemente.g., Principle-based Approach in U.S. & Solvency

II Regulations in Europe Source: Long Term Care Insurance Section (SOA)

Thank you for your Attention!

Q & A

50

Documents

Big Data and Insurance Industry - SOA · PDF fileSession 3 – Big Data and Insurance Industry Prof. Jack Ching‐Syang Yue, ASA