37
Principal Data Scientist Booz | Allen | Hamilton Kirk Borne @KirkDBorne Data Science Strategies for ReaI-Time Analytics http://www.boozallen.com/datascience https://careers.boozallen.com/en-US/search?keywords=data http://www.mif.pg.gda.pl/homepages/jasiu/stud/EiM/pdf/22-ind-zastosowania.pdf

Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Principal Data Scientist

Booz | Allen | Hamilton

Kirk Borne @KirkDBorne

Data Science Strategies

for ReaI-Time Analytics

http://www.boozallen.com/datascience https://careers.boozallen.com/en-US/search?keywords=data

http://www.mif.pg.gda.pl/homepages/jasiu/stud/EiM/pdf/22-ind-zastosowania.pdf

Page 2: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

2

Page 3: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

3

Page 4: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

4

Page 5: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

5

Can our electric grid

be more resilient?

Page 6: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

“Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience?

6

1. The capacity to recover quickly (bounce

back) from difficulties; toughness.

2. The ability of a substance or object to

spring back into shape; elasticity. http://www.starservice.org.uk/curriculumpage.php?subject=57 http://signsofpolitics.blogspot.com/2009/03/around-and-about-resilience.html

Page 7: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

“Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience?

7

1. The capacity to recover quickly (bounce

back) from difficulties; toughness.

2. The ability of a substance or object to

spring back into shape; elasticity.

e.g., Resilient Communities have the sustained

ability to utilize available resources to respond to,

withstand, and recover from adverse situations =

= resources + data + analytics = insights + decisions!

Page 8: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

“Enhancing Resilience in Infrastructure”

8

https://my.vanderbilt.edu/universityfundingprograms/2017/02/enhancing-safety-and-resilience-of-civil-infrastructure-through-interdisciplinary-research-vanderbilts-iris-initiative/

Page 9: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

9

Smart Electric Grid Use Cases • Spatiotemporal insight

• Situational / Context awareness

• Fast diagnosis & response

• Anomaly / Fraud / Loss detection

• Predictive maintenance

• Digital twins / Prescriptive action

• System performance optimization

• Resiliency

• Load balancing

• Predictive demand forecasting

• Real-time pricing

• New products

• Customer ‘nudge’

• Targeted offerings

• Smart contracts (Blockchain)

• Regulatory compliance

http://smartgridcenter.tamu.edu/sgc/web/wp-content/uploads/2016/03/grey-border-BigData_illustration_v5.jpg

https://www.slideshare.net/ImpetusInfo/realtime-streaming-analytics-business-value-use-cases-and-architectural-considerations-impetus-webinar

Page 10: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

10

Emerging and Disruptive Digital Technologies in the Energy Industry:

This is what Digital Disruption looks like!

http://www.leadingpractice.com/industry-standards/energy/oil-gas/

Page 11: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

11

Emerging and Disruptive Digital Technologies in the Energy Industry:

This is what Digital Disruption looks like!

http://www.leadingpractice.com/industry-standards/energy/oil-gas/

Page 12: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

The Data Science Revolution = Moving from data to insight to action!

12

Data Science enables the art of the possible :

The easy button for real-time analytics!

Manage the

Digital Disruption

with a

Data Science

Strategy

Page 13: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

13

Data needs a Transformer (like Electricity)

to make it accessible to all.

Page 14: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Massive data collections unlock deeper insights into hard problems and complex systems

…Be careful what you wish for!!!!

14

Page 15: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

15

Adding more data doesn’t necessarily help…

https://paulmead.com.au/blog/understand-perceptions/

Unless we can combine and integrate the different signals

into a “single view” of the thing, there will continue to be

many possible interpretations of what the source is!

Combining, connecting, and linking diverse data makes data “smart”!

Think of data not as information, but as facts that encode knowledge.

Page 16: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Environmental Analytics example

16

Transforming Data to Information to Knowledge to Understanding

16

Page 17: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Environmental Analytics example

17

17

Page 18: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

1) Class Discovery: Finding the categories of objects (population segments), events, and behaviors in your data. + Learning the rules that constrain the class boundaries (that uniquely distinguish them).

2) Correlation (Predictive and Prescriptive Power) Discovery: Finding trends, patterns, dependencies in data, which reveal the governing principles or behavioral patterns (the object’s “DNA”).

3) Novelty (Surprise!) Discovery:

Finding new, rare, one-in-a-[million / billion / trillion] objects, events, or behaviors.

4) Association (or Link) Discovery: (Graph and Network Analytics) – Finding the unexpected, (unusual ) co-occurring associations / links / connections among the entities in your domain.

4 Types of Discovery from Data Science:

What is your data analytics use case?

18

(Graphic by S. G. Djorgovski, Caltech)

Page 19: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

5 Levels of Analytics Maturity

in Data-Driven Applications

1) Descriptive Analytics

– Hindsight (What happened?)

2) Diagnostic Analytics

– Oversight (real-time / What is

happening? Why did it happen?)

3) Predictive Analytics

– Foresight (What will happen?)

19

Page 20: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

5 Levels of Analytics Maturity

in Data-Driven Applications

1) Descriptive Analytics

– Hindsight (What happened?)

2) Diagnostic Analytics

– Oversight (real-time / What is

happening? Why did it happen?)

3) Predictive Analytics

– Foresight (What will happen?)

4) Prescriptive Analytics

– Insight (How can we optimize what

happens?) (Follow the dots / connections in

the graph!)

5) Cognitive Analytics – Right Sight (the 360 view , what is the

right question to ask for this set of data in

this context = Game of Jeopardy)

– Finds the right insight, the right action, the

right decision,… right now!

– Moves beyond simply providing answers, to

generating new questions and hypotheses.

20

Page 21: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

3 Examples of Analytics

1) Descriptive

2) Predictive

3) Cognitive

21

Page 22: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

3 Examples of Analytics

1) Descriptive

2) Predictive

3) Cognitive

22

Page 23: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

All of the features in the data histogram convey valuable (actionable) information (the long tail, outliers, multi-modal peaks, …)

0

2000

4000

6000

8000

10000

12000

14000

-8 -6 -4 -2 0 2 4 6 8

23

Page 24: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Mixture Models = Statistical Clustering

Each of these data histograms can be represented by the mixture (i.e., sum) of several Gaussian normal distributions, such as the 3 Gaussian distributions shown in the lower right.

Each Gaussian statistically represents (characterizes) one “cluster” of data values within the full set of data values.

Comprehensive web resource for Mixture Models for clustering and unsupervised learning in Data Mining: http://www.csse.monash.edu.au/~dld/mixture.modelling.page.html

24

Page 25: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Statistical Clustering tags (characterizes) the data, enabling discovery: making the data “smart”!

25

Each Gaussian in the mixture can be characterized by various parameters, such as the mean, variance (standard deviation), and amplitude (i.e., the strength of that particular Gaussian component within the mixture).

These parameters can be plotted as a function of some independent (treatment) variable, to discover trends and correlations in the effects across the different segments of the population. h

ttp

s://

ww

w.r

esea

rch

gate

.net

/pu

blic

atio

n/6

20

022

4_C

on

form

atio

nal

_en

tro

py_

in_m

ole

cula

r_re

cogn

itio

n_

by_

pro

tein

s

25

Page 26: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

3 Examples of Analytics

1) Descriptive

2) Predictive

3) Cognitive

26

Page 27: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Classic Textbook Example of Data Mining (Legend?): Data

mining of grocery store logs indicated that men who buy

diapers also tend to buy beer at the same time.

Association Discovery Example #1

27

Page 28: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Wal-Mart studied product sales in their Florida stores in 2004

when several hurricanes passed through Florida.

Wal-Mart found that, before the hurricanes arrived, people

purchased 7 times as many of {one particular product}

compared to everything else.

Association Discovery Example #2

28

Page 29: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Wal-Mart studied product sales in their Florida stores in 2004

when several hurricanes passed through Florida.

Wal-Mart found that, before the hurricanes arrived, people

purchased 7 times as many strawberry pop tarts compared

to everything else.

Association Discovery Example #2

29

Page 30: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Strawberry pop tarts???

http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html http://www.hurricaneville.com/pop_tarts.html

http://bit.ly/1gHZddA 30

Page 31: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Association Rule Discovery for Hurricane Intensification Forecasting

• Research by GMU geoscientists

• Predict the final strength of hurricane at landfall.

• Find co-occurrence of final hurricane strength with specific values of measured physical properties of the hurricane while it is still over the ocean.

• Result: the association rule discovery prediction is better than National Hurricane Center prediction!

• Research Paper by GMU scientists: https://ams.confex.com/ams/pdfpapers/84949.pdf

31

Page 32: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

3 Examples of Analytics

1) Descriptive

2) Predictive

3) Cognitive

32

Page 33: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

“You can see a lot by just looking”

(and you can see around corners!)

Cognitive, Contextual, Insightful, Forecastful

33 https://www.speedcafe.com/2017/07/12/f1-demo-take-place-london-streets/

Page 34: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

Final Thoughts

34

Page 35: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

1) Design Patterns for Streaming Data Analytics: • Detecting POI (Pattern, Product, Process, Person, or any Point Of Interest) • Detecting BOI (Behavior Of Interest from any “dynamic actor”) • Precomputed scenarios and their responses (to speed up “best action”) • Design Thinking : UX, CX, EX (User / Customer / Employee eXperience)

2) Edge Analytics (move the algorithms to the sensor: intelligence at the

point of data collection) • Locality in Time

3) Near-field Analytics (what else is local to my asset?)

• Locality in Geospace

4) Related-entity Analytics (what else is similar to this event / entity?)

• Locality in Feature Space

5) Agile Analytics • DataOps • Culture of Experimentation • Fail-fast / Learn-fast • Build and deploy Learning Systems / Resilient Systems

Data Science Strategies for Real-time Analytics

35

Page 36: Data Science Strategies for ReaI-Time Analytics · “Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience? 6 1. The capacity to recover quickly (bounce

In the Big Data era, Everything is Quantified and Monitored : – Populations & Persons – Smart Cities, Energy, Grids, Farms, Highways – Environmental Sensors – IoE = Internet of Everything!

Discovery through Machine Learning and Data Science:

– Class Discovery, Correlation Discovery, Novelty Discovery, and

– Association Discovery: Find interesting cases where condition X is associated with event Y with time shift Z.

17 SDGs are KPIs for the World!

(currently, the SDGs have 229

Key Performance Indicators) ( SDG: Sustainability Development Goal )

Big Data + the IoT + Citizen Data Scientists =

= Partners in Sustainability

The Internet of Things (IoT): Knowing the knowable via deep, wide, and fast data from ubiquitous sensors!

Big Data:

Sustainability Development Goals

http://www.unglobalpulse.org

36