13
Capabilities Overview

Lake Hill Analytics General Capabilities Overview

Embed Size (px)

Citation preview

Capabilities Overview

About

• Accomplished professional with experience that includes startups, mature businesses, and Fortune 500 enterprises.

• Regardless of company size, data needs and challenges are often common

• Over 15 years experience with large datasets, realtime data, business management, and project management.

• Any project starts and ends with an assessment of business objectives. In between, data and analysis play a central role - but always in support of business objectives.

• Typical engagement focuses on fixing problems - siloed data, integration of disparate data sets, over-dependence on spreadsheets, unclear segmentation, data visualization needs

Areas of Expertise• Software Development including but not limited to:

• Python for data gathering and analytics

• R for rapid prototyping

• Amazon Web Services for infrastructure

• Data Management

• Identify data sources

• Integrate multiple sources into single database

• Data cleansing

• Marketing

• Market Analytics

• Market Research

• New Product Development

• Analytics

• Segmentation

• Regression

• Simulation

• Project Management

• Experience in varied verticals:

• Energy

• Media

• E-Commerce

• Computer Hardware / Manufacturing

Pragmatic Data Analysis• Identify strategic goals, tactical needs, and existing

pain points

• “How do these goals / needs / pains affect the balance sheet?”

• Data comes next:

• “What data is available today?”

• “How is it stored?”

• “What additional data would be useful to you?”

• Data collection and preparation is most time-intensive step in process

• Process is iterative; test and refine models prior to project completion.

Based on the CRISP-DM Model

Working With Lake Hill

Initial Discussions

Identify Key Project Elements

Data Review

External Data Source Identification

Project Planning

Execution

Identify the goals and vision for data and analytics in your business

What are the data sources, analyses required, and project deliverables?

Sample data and data dictionary if available. Commitment to start with raw data preferred.

Format and availability (CSV, JSON, API access, Web Scrape) Cost

Pilot to test assumptions, relieve pain points Full execution after a successful pilot

Case Studies

• EveryWomanNC

• Mining Music Intelligence

• Increasing Physician Referrals

Case Study: EveryWomanNC• EveryWomanNC is a nonprofit with a mission to improve preconception health, and thus birth outcomes in

North Carolina

• Problem:

• Identify specific NC regions to focus public health outreach, and present in a simple fashion.

• Source Data:

• PDF tables provided by NC State Center for Health Statistics.

• Useful for human consumption, but not machine readable or easily analyzed.

• Solution:

• Create choropleth maps by county for NC that highlight key areas of need based on public health indicators.

• Technology Used:

• Unix command line utility to convert PDFs to machine readable format.

• Python to clean and extract data, as well as create the maps.

• Illustrator to clean up maps (fonts, etc).

• Keynote for final presentation formatting.

EveryWomanNC Source Data2010�North�Carolina�Teen�Pregnancies

Ages�15Ͳ19

Total Pregnancies Rate per 1,000

NORTH CAROLINA 15,957 49.7

ALAMANCE 253 44.5ALEXANDER 39 34.9ALLEGHANY 17 *ANSON 59 66.3ASHE 31 47.4AVERY 19 *BEAUFORT 94 67.4BERTIE 37 56.8BLADEN 60 53.5BRUNSWICK 118 46.7BUNCOMBE 275 40.0BURKE 122 42.3CABARRUS 259 44.2CALDWELL 145 53.1CAMDEN 11 *CARTERET 77 43.2CASWELL 28 38.7CATAWBA 245 49.1CHATHAM 85 53.8CHEROKEE 37 49.1CHOWAN 28 65.6CLAY 10 *CLEVELAND 201 56.2COLUMBUS 111 57.8CRAVEN 210 67.3CUMBERLAND 768 67.7CURRITUCK 38 50.2DARE 33 40.0DAVIDSON 275 53.9DAVIE 47 35.9DUPLIN 120 66.1DURHAM 478 53.4EDGECOMBE 150 73.7FORSYTH 636 50.2FRANKLIN 88 48.1GASTON 405 59.9GATES 15 *GRAHAM 17 *GRANVILLE 94 51.0GREENE 34 53.5GUILFORD 792 41.7HALIFAX 137 72.5HARNETT 226 53.4HAYWOOD 68 41.9HENDERSON 138 52.0HERTFORD 51 54.0HOKE 92 62.4HYDE 5 *IREDELL 240 44.6JACKSON 61 31.0

*Technical Note: Rates based on small numbers (fewer than 20 cases) are unstable and are not reported.

Source: NC Department of Health & Human Services State Center for Health Statistics, 06OCT2011

Table 1Durham County Resident Births for 2010

By Age of Mother and Birth OrderFor All Women

Source: NC Department of Health & Human Services State Center for Health Statistics, 06OCT2011

Table 1Durham County Resident Births for 2010

By Age of Mother and Birth OrderFor All Women

Age ofMother

Birth OrderTotal1st 2nd 3rd 4th 5th 6th 7th 8th 9th or More Not Stated

13 1 0 0 0 0 0 0 0 0 0 1

14 5 0 0 0 0 0 0 0 0 0 5

15 17 0 0 0 0 0 0 0 0 0 17

16 33 4 0 0 0 0 0 0 0 0 37

17 50 6 0 0 0 0 0 0 0 0 56

18 63 19 6 0 0 0 0 0 0 0 88

19 76 26 9 0 0 0 0 0 0 0 111

20 91 50 26 6 5 0 0 0 0 0 178

21 69 43 26 2 3 4 0 0 0 1 148

22 50 49 24 8 5 1 0 1 1 2 141

23 68 50 25 14 8 0 1 1 0 0 167

24 45 51 29 24 6 2 0 0 2 2 161

25 thru 29 448 289 209 113 63 27 8 6 5 10 1178

30 thru 34 436 331 217 133 58 33 13 11 13 9 1254

35 thru 39 145 140 117 86 38 22 7 6 14 4 579

40 thru 44 26 32 20 16 12 3 4 5 2 2 122

45 or Older 1 2 5 1 1 1 0 1 2 0 14

Total 1624 1092 713 403 199 93 33 31 39 30 4257

EveryWomanNC Maps

Case Study: Mining Music from the Web• KEXP is a dynamic arts organization that provides rich music experiences on the air, online, and on the streets. KEXP’s unique services

benefit three distinct groups: Music Lovers, Artists, and the Arts Community. (1)

• Challenge:

• What information can be learned from 12 years of playlist data?

• Can we predict whether early airplay on KEXP is a predictor of popularity?

• Can we use this dataset as a training dataset and design a predictive algorithm?

• Source Data:

• Scraped web page playlist data provided titles, DJ, and playlist order

• API metadata from Gracenote and Echo Nest augment the data with further track information

• Technology Used:

• Python for web scraping, parsing into a database, and interacting with third party APIs

• Amazon S3 to store web pages

• Amazon EC2 instances to run all data gathering, parsing, and analysis

• Amazon RDS to host MySQL database

• Status:

• Proof-of-concept: data collection and hosting on Amazon is feasible and cost-effective.

• In process; data models assembled, beginning work of answering challenge questions.

(1): http://kexp.org/about

Case Study: Increasing Physician Referrals (1)• Client is a marketing agency in the healthcare space

• Problem:

• Referrals from high-dollar specialties (oncology, cardiology, bariatrics, etc) are coveted by hospital systems.

• Client developed effective tactical marketing plans for hospitals, but involves significant outreach and collateral in the physician offices.

• Expensive tactical marketing programs generate ROI through referrals into the hospital system

• By becoming more effective at identifying the best referring physicians and targeting, hospitals can focus on improving referral rates which bring more high-value business to the hospital.

• Source Data:

• 2011 Medicaid and Medicare referral data (1.3 million rows)

• National Provider Index for detailed practice information

• Additional Data Desired:

• Disease prevalence rates by county - from CDC

• Medicaid / Medicare enrollment rates by county

Case Study: Increasing Physician Referrals (2)• Proposed Phased project:

• Pilot:

• Identify referring physicians in a single specialty within a single hospital system.

• Segment according to patients referred to hospital, as well as all other hospitals

• Ongoing:

• Expand to all specialties and healthcare systems within a state

• Provide the client with segmentation, as well as baseline referral rates for physicians by specialty in a market

• Integrate additional data such as disease rates, Medicare / Medicaid rates.

• Technology Used:

• Python for data collection, API queries, database construction, and reporting

• Amazon Web Services for infrastructure and hosting.

ContactDamian Herrick Lake Hill Analytics, LLC !

[email protected] 919-627-7051