41
Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Bryan Hinton Senior Vice President, Platform Engineering Health Catalyst Sean Stohl Senior Vice President, Product Development Health Catalyst

Session 30 Powerful Ways to Use Hadoop in your Healthcare ...hasummit.com/wp-content/uploads/2016/05/30... · •Created by Doug Cutting and Mike Cafarella at Yahoo in 2005. • Hadoop

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Session 30

Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

Bryan HintonSenior Vice President, Platform EngineeringHealth Catalyst

Sean StohlSenior Vice President, Product DevelopmentHealth Catalyst

2

Poll Question #1

3

What brought you here?

a) Everyone is talking about Big Data/Hadoop – What is it?b) Searching for uses cases – What is the value proposition?c) Need help implementing itd) Want to hear others’ experiencese) I got lost

Learning Objectives

Be able to explain

• What is Big Data and Hadoop

• Why do we need Big Data and Hadoop in Healthcare

• What are the challenges to adoption

• How do I get started

• See it in action

4

5

Scaling Up Limits

6

What does it take to reach the Big Data threshold?3 V’s of Big Data

We Are Not “Big Data” in Healthcare Yet

7

8

Volume, Velocity, and Variety aren’t the only reasons to move

Dear Data…

• Created by Doug Cutting and Mike Cafarella at Yahoo in 2005. • Hadoop named after Cutting’s son’s toy elephant.

• “The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term.” - Doug Cutting

• Open-sourced software framework that supports processing and storing of large data sets distributed across clusters of commodity hardware.

• Map Reduce - Parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query.

• HDFS – Hadoop Distributed File System. File System that provides the capability to distribute data across a cluster to take advantage of the parallel processing of Map Reduce.

History of Hadoop

Map Reduce Example

10

Poll Question #2

11

How would you categorize your organization’s involvement with Hadoop?

1) Not interested2) Interested but no plans to implement3) Planning implementation4) Piloting Hadoop5) Heavily using Hadoop6) Unsure or not applicable

12

• Data Growth• Different Types of Workload

• Semi Structured• Archiving• Streaming• Machine Learning

Why Big Data and Hadoop in Healthcare

Just Beginning: Digitization of Health

13

“EMR data represents ~8% of the data we need for population health and precision medicine.” — Alberta Secondary Use Data Project

The Growing Ecosystem of Human Health Data

Healthcare Encounter

Data

7x24 Biometric

DataConsumer

Data

Genomic &

Familial Data

Social Data

Outcomes Data

14

• Structured• Data that can be stored relationally in RDBMS

• Semi Structured• Data that has some organizational properties but isn’t in a relational database format

• CSV, XML, X12 (835/837) , HL7, JSON

• Doctor Notes - Template Generated Sections

• Unstructured• E-mails, text messages, Word documents, videos, and pictures

• Doctor Notes – Free Form Sections

Types of Data

15

Archiving

16

Streaming

17

18

Implementation

19

Challenges to Adoption and How to Overcome Them

Poll Question #3

21

Which challenge has been or would be the greatest barrier for your organization to adopt Hadoop?

a) People with the right skill setsb) Funding hardware costsc) Defining the business valued) Security concernse) Unsure or not applicable

22

Challenges to adoption

OrganizationalBuyingAdministeringUsing

23

Organizational

24

Stuck in the Mud

Buying

25

Cloud

26

Administering Fewer experienced people Lack of best practices Myriad of tools Open Source yes – but lots of assembly required Security?

27

Packaged Solutions

28

Administering

29

Invest in your people

30

Using

31

Using• Which SQL on Hadoop

Hive

Impala

Spark SQL

Apache Drill

32

Meeting in the middle

33

RDBMS Vendors

• Oracle• SQL Server• Teradata• …

Hadoop Solutions

• Hortonworks• Cloudera• Mapr• Cloud• …

Convergence

Don’t Rip and Replace

34

35

Additive Approach

36

Data Operating System

Demos

Lessons Learned

38

1. Let use cases help drive the need to implementing Hadoop. (Be Pragmatic.)2. Think additive.3. Invest in people now.4. In general, the Cloud will give you the most flexibility in deploying Hadoop.

Analytic Insights

AQuestions &

Answers

39

What You Learned…

40

Write down the key things you’ve learned related to each of the learning objectives

after attending this session

Thank You

41