Upload
duongminh
View
215
Download
1
Embed Size (px)
Citation preview
Session 3 – Big Data and Insurance Industry
Prof. Jack Ching‐Syang Yue, ASA
6/30/2015
1
2015/06/25
2
Big Data and Insurance Industry
Speaker: Jack C. Yue
National Chengchi Univ.
Date: June 25, 2015
Email: [email protected]
6/30/2015
2
Summary
What is Big Data?
Attributes of Big Data
Limitations of Big Data
Big Data and Insurance
Future of Big Data
3
What is Big Data?
The term “Big Data” was first proposed by IBM in 2010, and it involves three V’s:
Volume: size at least TB or PB
Variety: including visual and GIS data
Velocity: real time (instant) analysis
Note: According to IDC (International Data Corporation), the capacity of data storage grows 50% annually.
4
6/30/2015
3
5
The Growing Size of a Computer File!
6
6/30/2015
4
7
8
Information Explosion!
Source:The Expanding Digital Universe, A Forecast of Worldwide Information Growth Through 2010,March 2007, An IDC White Paper - sponsored by EMChttp://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
IDC predicted in 2007 the storage will grow sixfold: 2006 161 EB2010 988 EB (Pred.)Note: International Data Corporation (IDC)
6/30/2015
5
9
Source:The Diverse and Exploding Digital Universe, An Updated Forecast of Worldwide Information Growth Through 2011March 2008, An IDC White Paper - sponsored by EMChttp://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf
IDC updated (in 2009) the prediction to tenfold from 2006 to 2011:2006 161 EB
2007 281 EB2010 988 EB (pred.)2011 1773 EB (pred.)
The Explosion Starting in 2007 …
The Era of Big Data! Data accumulation speed is much faster… 10+ years (1990-2004) to complete the Human
Genome Project, sequencing 3 billion letters (base pairs), and now it takes only one day.
More than 70% of the stock shares traded in 2010 (NYSE + NASDAQ) are generated from Automatic Trade System (or Statistical Trading System). (30,000 trades in NYSE every second).
Facebook has 10 million plus pictures uploaded every hour and more than 3 billion “like” everyday. (User data size 300 PB) 10
6/30/2015
6
11
Attributes of Big Data
http://www.abc.net.au/reslib/201204/r926508_9670000.jpg
Value of Information In the late 1990’s, Amazon hired more than 10
book critics to give recommended lists. But, the sales of books from readers’ suggestions are much better. (At least 1/3 sales are from the automatic system.)
Wal-mart used historical data in 2004, and found that the sales of flash light and Pop-Tarts increase before the hurricane. (Note: Diaper and Beer is another well-known example.)
12
6/30/2015
7
About Wal-Mart
Wal-mart was one of the first companies to collect and analyze customers’ data, and use them to increase sales.
During the weekend, people who purchase diapers tend to purchase beer as well.
Question: Why are they purchased together and how can use this information to stimulate the sales?
Adding Value to the Data
A famous example in Data Mining:$$$
Strategies of marketing and warehouse
Customer1Customer2 Customer3
Milk, eggs, sugar, bread
Milk, eggs, cereal, bread
Eggs, sugar
6/30/2015
8
Value of “Diaper and Beer”
“Diaper and beer” is a good example of Association, instead of Causality.
Association can create value from the data, not necessarily via finding the causality. In the case of diaper and beer, we can plan:
Pricing and marketing;
Goods layout and floor design:
Warehouse and Inventory.
Attributes of Big Data
In addition to the size, big data also have the following attributes:
Sample = PopulationNon-uniform Data with NoiseAssociation vs. Causality
http://mmdays.com/wp-content/uploads/2013/03/Bigdata.jpg
6/30/2015
9
“Sample = Population” (n = all?)
We can use the whole population and don’t need to infer from sample data. (Bias?)
More updated information (c.f. Census);Remove possible sampling bias;Prevent insufficient sample. But we need to consider the problemsof storage, updating, and analysis.
http://www.smartercomputingblog.com/wp-content/uploads/2012/06/where-is-your-data.png
Sample Representative
6/30/2015
10
19
Sampling Bias!!
Non-uniform Data with Noise
In addition to its voluminous and non-uniform nature, big data tend to have “noises.”
Sources and formats create the problem of data compatibility. (Meta Analysis!)
Need to modify in-homogeneity. Volume is more important (c.f. quality)
CPI official statistics vs. Commodity prices from web (Deflation after Lehman Brothers' bankruptcy)
http://blog.backupify.com/2013/04/01/the-5-but-really-the-6-mistakes-made-in-big-data/00009-08_sept253-jpg/
6/30/2015
11
Association vs. Causality
Quantitative analysis alone cannot decide the causality, and there may exist latent variables. (e.g., foot size and spelling ability of grade school students)
Complete data and powerful tools make finding association possible (pre-caution).
UPS started Preventive Maintenance in 2000.New York City used the manhole records to
predict (defective) explosions. (2010 Wired)
22Photographer: David Paul Morris/Bloomberg
United Parcel Service (UPS) was one of the earliest adopters of business analytics in Big Data.
6/30/2015
12
23
Limitations of Big Data
http://www.tnooz.com/2012/01/04/how-to/big-data-and-the-infinite-possibilities-for-the-travel-industry/
Statistics and Knowledge Statistics (& quantitative tools) can be
classified as Induction, and help us to identify:
Regular
Extreme
Irregular
6/30/2015
13
Induction
Deduction
Information or Junk Explosion
Information is everywhere but what are the really important factors?
What information is essential?
Data quality? (Garbage in, garbage out!)
How the information is used for decision? (e.g., avoid high risk decisions, reduce the possible risk.)
6/30/2015
14
Data
Information
Fact
Knowledge
Trial and Error The size of big data makes it difficult to
create SOP for analysis, and the traditional Trial-and-error can be used.
Data scientists at Kaggle found that the best bet is an orange used car. (odd color vs. self-expression of car owner??)
In the prediction of NYC defective manholes (using 2008 to predict 2009) , the correction rate is 44%. (Key factor: year cable made)
6/30/2015
15
29
The data usually are collected via:1. Experimental Design2. Observational Study
The major difference is if the collection would influence the outcomes. (e.g., Observing the stock and housing markets usually won’t distort the results.)
Note: We shall focus on “association.”
About Data Collection
6/30/2015
16
In pricing insurance products, we always look for risk factors.
Age and gender are two well-known factors, and theories are proposed for explanation.
Marriage is also a potential factor, and possible reasons are selection-at-marriage, responsibility, living arrangement (& reciprocal care giving), and social interaction.
Causality is not obvious!
32Taiwan’s Age-specific Mortality Rates (2009-11)
6/30/2015
17
33Taiwan’s Marital Mortality Rates (Female)
Age
log
(qx)
15-19 30-34 45-49 60-64 75-79 90-94
-8-6
-4-2
1994-96 Married1994-96 Single1994-96 Divorced
Age
log
(qx)
15-19 30-34 45-49 60-64 75-79 90-94-8
-6-4
-2
2009-11 Married2009-11 Single2009-11 Divorced
34
Big Data and Insurance
http://h30507.www3.hp.com/t5/Journey-through-Enterprise-IT/Analyze-This-Big-Data-is-insurance-against-losing-a-competitive/ba-p/143577#.UgZmpLQVEqQ
6/30/2015
18
Big Data and Insurance Industry In 2009, Google analyzed keywords used in
the search engine (3 billion records daily). Comparing to the 2003~08 records in CDC, Google can detect the spread of H1N1 at least one week earlier. (User feedback!)
John Snow (1854) studied the under-water system of London and found thatthe spread of Cholera is related topolluted water. (Spatial Statistics)
New Territory of Insurance Industry?
Big Data is insurance against losing a competitive edge…..
Insurance companies face more diverse risk, in addition to interest risk. Living longer and more information create new possibilities and new risks. (e.g., longevity risk & moral hazard)
Note: We shall use the health insurance products as a demonstration.
6/30/2015
19
Insurance related Big Database
In addition to the experienced data from individual insurance companies, public data are also available:
Mortality Study: Human Mortality Database (HMD) and Ministry of Interior (Taiwan).
Health Data: National Health Insurance (Taiwan), Society of Actuaries (SOA), and United States Renal Data System (USRDS).
Taiwan’s National Health Insurance
Taiwan started the national health insurance (NHI) in 1995, and more than 99% population are covered (excluding oversea works).
Waiver of copayment is one of the important policy in NHI. In addition to veterans, pregnant women, and people in remote areas, Catastrophic Illness (CI) patients also enjoy the copayment waiver.
CI patients (4% population) spend 30% of total cost in 2014. 38
6/30/2015
20
Handling Big Data
The size and quality of NHI database make data analysis difficult.
Need to rely on database software and data scientists (e.g., IT experts).Data cleaning is a big issue, especially the health care data are from different hospitals. Data Discrepancy?The death records are not complete in NHI database, and many are even wrong!
39
40
6/30/2015
21
It is difficult to handle the big data using regular software and the database software (e.g., SQL) is required.
Data cleaning and exploratory data analysis (EDA) are the key to success.
For example, more than one databases are available and there exist discrepancy.
Note: ID Incidence; HV_CD Mortality 41
Cleaning the Data
42
The Future of Big Data
http://cdn.marketingtechblog.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-28-at-11.22.05-AM.png
6/30/2015
22
Things to be Considered……
To incorporate with big data, the insurance industry needs to consider:
Obtaining and updating data: confidential issue in a company; data sharing between companies (property right)
Maintenance and usage: Safety and privacy (data cloud?); Institutional Review Board
Data and meta analysis: Industry-academic cooperation; R&D
The purpose of IRB review is to assure, both inadvance and by periodic review, that appropriatesteps are taken to protect the rights and welfare ofhumans participating as subjects in the research.
IRBs use a group process to review researchprotocols and related materials (e.g., informedconsent documents and investigator brochures) toensure protection of the rights and welfare of humansubjects of research.
Human Protection in Human ResearchWhat, Why, How, When and Where
6/30/2015
23
Suggestions from Other Users
Barry Ralston (Assistant VP of Data Management
at Infinity Insurance) suggests: Analyze data that matters. As you store, so you retrieve. Improve performance at the origin. Time-to-decision matters. Get ahead of the game.Note: Ralston says “Data is key to what we do.”
45
My Suggestions for using Big Data
In addition to actuaries, insurance companies also need experts in big data (& information technology), such as data scientist/statistician.
Data are an important asset, and regulating the data trading would become necessary.
Let the users of big data bear the burden of privacy issue.
46
6/30/2015
24
47
Big Data = Opportunity?
The profitability is always a issue. (Use the sale of LTC as an example.)
48
LTC Insurance in U.S.A.
The main reasons why the sales of LTC insurance in U.S.A. are not good: Low Consumer Demand, Pricing, and
Managing the Risk.The LTC insurance is associated to long term
risk and not easy to handle ALM: Long-tail Liability Risk;Cash Flow (Asset Management)Note: Low interest rate and low withdrawal rate
would increase the premiums. (10%~40%).
6/30/2015
25
49
Factors Influencing LTC Sales
Asset Investment Strategiese.g., Convertible Bonds, Derivatives, Collateralized
Loan Obligations, Private PlacementsLiability Risk Transfere.g., Claim Securization, Commission Securization,
Offshore Reinsurance, Product RedesignRegulatory Impacts on LTC Risk Managemente.g., Principle-based Approach in U.S. & Solvency
II Regulations in Europe Source: Long Term Care Insurance Section (SOA)
Thank you for your Attention!
Q & A
50