105
@ODSC OPEN DATA SCIENCE CONFERENCE Boston | May 1 - 4 2018

OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

@ODSC

OPEN

DATA

SCIENCE

CONFERENCEBoston | May 1 - 4 2018

Page 2: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Principal Data Scientist

Booz Allen Hamilton

http://www.boozallen.com/datascience

Kirk Borne@KirkDBorne

A Tour of Machine Learning Algorithms:The Usual Suspects in Some Unusual Applications

Booz | Allen | HamiltonODSC Conference – Boston – May 2018

Page 3: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

OUTLINE

• Goals of Machine Learning Applications

• Some Machine Learning Algorithms

• Some Atypical Applications

• Final Thoughts

3

@KirkDBorne

Page 4: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

OUTLINE

• Goals of Machine Learning Applications

• Some Machine Learning Algorithms

• Some Atypical Applications

• Final Thoughts

4

@KirkDBorne

Page 5: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

The Goal of Machine Learning

“…is to use algorithms to learn from data,

in order to build generalizable models that

give accurate classifications or predictions,

or to find patterns, particularly with new

and previously unseen data.”

(the key is GENERALIZATION!)

https://www.innoarchitech.com/machine-learning-an-in-depth-non-technical-guide/

5

Page 6: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

6

How does a Data Scientist build amodel of a complex dynamic system?

6

Page 7: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

7

We might start by modeling a complex system like this…

7

Page 8: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

8

We can add more features to model the system with higher fidelity …

8

Page 9: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Overfitting = bad machine learning!

http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning.html

9

Generalization is key!

(The Goldilocks model)

Page 10: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Overfitting = bad machine learning!

d(x)

g(x) is a poor fit (one line through all points) = Underfitting (bias)

h(x) is a good fit (takes into account the natural variance in the data)

d(x) is a very poor fit (every point) = Overfitting (natural variance)

overfitting

10

Page 11: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bias–Variance Tradeoff in Model Complexity

11http://scott.fortmann-roe.com/docs/BiasVariance.html

Generalization Error

precisely right vs. generally right

precisely wrong vs. generally wrong

Page 12: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Schematic Approach to Avoiding Overfitting

Error

Training Epoch

Validation Set error

Training

Set error

To avoid overfitting, you

need to know when to stop

training the model.

Although the Training Set

error may continue to

decrease, you may simply be

overfitting the Training Data.

Test this by applying the

model to Validation Data Set

(not part of Training Set).

If the Validation Data Set

error starts to increase,

then you know that you are

overfitting the Training Set

and it is time to stop!

STOP Training HERE !

Generalization is key!12

Page 13: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Some Quick Definitions• Machine Learning (ML) = mathematical

algorithms that learn from experience(= pattern recognition from previous evidence).

• Data Mining = application of ML algorithms to data.

• Artificial Intelligence (AI) = application of ML algorithms to robotics and machines = taking actions based on data ( #bots ).

• Data Science = application of scientific method to discovery from data (including statistics, ML, and more: visual analytics, machine vision, computational modeling, semantics, graphs, network analysis, NLU, data indexing schemes [Google!], …).

• Analytics = the products of machine learning & data science.13

“To back-propagate is A.I., but to know is Intelligence!”

Page 14: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

It’s all about taking “Data to Action”Example: a Decision Tree …

https://eight2late.wordpress.com/2016/02/16/a-gentle-introduction-to-decision-trees-using-r/

14

Page 15: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

An “Easy Button” for Extracting Value from Data through Machine Learning

Exploiting the Data Value Chain: transform Digital Data to Information to Knowledge to Insights (and Action)

✓ From Sensors (Measurement & Data Collection) …

… Big Data (Deep, Fast, Wide)

✓ to Sentinels (Monitoring & Alerts = Information) …

… Machine Learning

✓ to Sense-making (Knowledge & Insight Discovery) …

… Data Science

✓ to Cents-making (Your Applications of Data = Action!)

… Analytics

… Productizing / Monetizing your Big Data

15

Page 16: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Typical Machine Learning Applications

in our lives :

Your Purchase Preferences, Recommender Systems,

Credit Scoring, Smart Phone auto-complete …

Your Thermostat, Your Commute Time and Routing,

Personalized Learning …

Your Health Issues (wearables), Your Best Deal

(Bed & Breakfast or Restaurant) …

Your Social Sentiment, Identify Theft,

Credit Card Fraud …

16

PREDICT

OPTIMIZE

DISCOVER

DETECT

Page 17: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Typical Machine Learning Applications

in the Enterprise :

Predict outcomes, events, prices, costs, risks,

product demand …

Optimize processes, products, and people

(delivery of services, supplies, personnel) …

Discover insights in social media, documents,

quarterly business reports, customer call records...

Detect fraud, anomalies in safety events,

behaviors, outbreaks, data usage (GDPR),

systems (cybersecurity breaches) …

17

PREDICT

OPTIMIZE

DISCOVER

DETECT

Page 18: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

4 Types of Machine Learning Discovery from Data:

18

(Graphic by S. G. Djorgovski, Caltech)

1) Class Discovery: Find the categories of objects (population segments), events, and behaviors in your data. + Learn the rules that constrain the class boundaries (that uniquely distinguish them).

2) Correlation (Predictive and Prescriptive Power) Discovery: (insights discovery) – Find trends, patterns, dependencies in data that reveal the governing principles or behavioral patterns (the object’s “DNA”).

3) Outlier / Anomaly / Novelty / Surprise Discovery: Find the new, surprising, unexpected one-in-a-[million / billion / trillion] object, event, or behavior.

4) Association (or Link) Discovery: (Graph and Network Analytics) – Find both the typical (usual) and the atypical (unusual, interesting) data associations / links / connections in your domain.

Page 19: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

5 Levels of Analytics Maturity

in Data-Driven Applications1) Descriptive Analytics

– Hindsight (What happened?)

2) Diagnostic Analytics

– Oversight (real-time / What is

happening? Why did it happen?)

3) Predictive Analytics

– Foresight (What will happen?)

19

Page 20: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

5 Levels of Analytics Maturity

in Data-Driven Applications1) Descriptive Analytics

– Hindsight (What happened?)

2) Diagnostic Analytics

– Oversight (real-time / What is

happening? Why did it happen?)

3) Predictive Analytics

– Foresight (What will happen?)

4) Prescriptive Analytics

– Insight (How can we optimize what happens?) (Follow the dots / connections in

the graph!)

5) Cognitive Analytics– Right Sight (the 360 view , what is the right

question to ask for this set of data in this

context? = Game of Jeopardy)

– Finds the right insight, the right action, the right decision,… right now!

– Moves beyond simply providing answers, to generating new questions and hypotheses.

20

Page 21: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

PREDICTIVE

Analytics

Find a function (i.e., the model) f(d,t) that

predicts the value of some predictive

variable y = f(d,t) at a future time t, given

the set of conditions found in the training

data {d}.

=> Given {d}, find y.

PRESCRIPTIVE

Analytics

Find the conditions {d’} that will produce a

prescribed (desired, optimum) value y at a future time t, using the previously learned

conditional dependencies among the

variables in the predictive function f(d,t).

=> Given y, find {d’}.

Predictive vs Prescriptive:

What’s the Difference?

21

Page 22: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

PREDICTIVE

Analytics

Find a function (i.e., the model) f(d,t) that

predicts the value of some predictive

variable y = f(d,t) at a future time t, given

the set of conditions found in the training

data {d}.

=> Given {d}, find y.

PRESCRIPTIVE

Analytics

Find the conditions {d’} that will produce a

prescribed (desired, optimum) value y at a future time t, using the previously learned

conditional dependencies among the

variables in the predictive function f(d,t).

=> Given y, find {d’}.

Predictive vs Prescriptive:

What’s the Difference?

22

Confucius says…

“Study your past to know

your future”

Page 23: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

PREDICTIVE

Analytics

Find a function (i.e., the model) f(d,t) that

predicts the value of some predictive

variable y = f(d,t) at a future time t, given

the set of conditions found in the training

data {d}.

=> Given {d}, find y.

PRESCRIPTIVE

Analytics

Find the conditions {d’} that will produce a

prescribed (desired, optimum) value y at a future time t, using the previously learned

conditional dependencies among the

variables in the predictive function f(d,t).

=> Given y, find {d’}.

Predictive vs Prescriptive:

What’s the Difference?

23

Confucius says…

“Study your past to know

your future”

Baseball philosopher Yogi Berra says…

“The future ain’t what it

used to be.”

Page 24: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

24Source for graphic: https://data-flair.training/blogs/machine-learning-applications/

Predictive Analytics is currently the most significant application of Machine Learning (*)

(*) The set of mathematical algorithms that learn (patterns) from experience (data)

Page 25: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

25Source for graphic: https://www.altexsoft.com/blog/datascience/machine-learning-strategy-7-steps/

Predictive Analytics is everywhere in Business Data and Machine Learning (AI) Strategy Discussions

Page 26: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Traditional Time Series Forecasting:Prediction based on historical patterns

Source: https://medium.com/99xtechnology/time-series-forecasting-in-machine-learning-3972f7a7a467

26

Page 27: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Traditional Time Series Forecasting:Autoregressive (uncertainty in prediction can be large)

Source: https://peltiertech.com/excel-fan-chart-showing-uncertainty-in-projections/

Un

cert

ain

ty!

27

Page 28: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Traditional Time Series Forecasting:Autoregressive (assumes future time series values

depend on the past values from the same series)

Source: http://ucanalytics.com/blogs/step-by-step-graphic-guide-to-forecasting-through-arima-modeling-in-r-manufacturing-case-study-example/

28

Page 29: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Traditional Time Series Forecasting:Even with very high-fidelity physics-based models,

uncertainty in prediction can be large!

Source: https://www.reddit.com/r/weather/comments/6xecax/tracking_hurricane_irma/ 29

Page 30: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

30

Source for image: https://www.hausmanmarketingletter.com/translating-analytics-to-action/

Advances in Predictive, Prescriptive,and Cognitive Analytics provide us with

More Ways to See Around Corners

Page 31: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

“You can see a lot by just looking”

(and you can see around corners!)

Cognitive, Contextual, Insightful, Forecastful

31https://www.speedcafe.com/2017/07/12/f1-demo-take-place-london-streets/

Page 32: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

The best forecasting methodology is not autoregressive and univariate y = f(t) …

Rather, it is context-based (using rich contextual data)The Internet of Things (IoT) will provide rich contextual metadata (“other

data about your data”) for seeing around corners and for better forecasting!

IoT will power The Internet of Context!

Source for graphic: https://www.iotcentral.i o/bl og/direct-integration- between-the- physical- world-computer-system 32

Page 33: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

OUTLINE

• Goals of Machine Learning Applications

• Some Machine Learning Algorithms

• Some Atypical Applications

• Final Thoughts

33

@KirkDBorne

Page 34: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment34

Page 35: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

OUTLINE

• Goals of Machine Learning Applications

• Some Machine Learning Algorithms

• Some Atypical Applications

• Final Thoughts

35

@KirkDBorne

Page 36: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms0) Counting

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment36

0

Page 37: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

“All models should be as simple as possible, but no simpler!” – A.Einstein

0) Counting!

Remember…

37

Page 38: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment38

1

Page 39: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

39

PCA vs ICA

Initial impression is that the data are extended in only one direction (one principal component)

Page 40: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

40

PCA vs ICA

Initial impression is that the data are extended in only one direction (one principal component)

Page 41: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

41

Initial impression is that the data are extended in only one direction (one principal component)

PCA vs ICABut, there are

2 independent correlations here

… hence there are

2 signal sources!

Cocktail Party

Problem: example

of Class Discovery using ICA

(IndependentComponent Analysis:

Blind Source Separation)

Page 42: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment42

2

Page 43: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Lines in big data sets: Descriptive Analytics! It is tempting to over-fit every wiggle in the data.

-1000

0

1000

2000

3000

4000

5000

6000

7000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Bo

ilin

g P

oin

t

Melting Point

92 Naturally Occurring Elements

43

Page 44: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

This is a better fit to the trend line… (generalization!) for use in Predictive Analytics!

0

1000

2000

3000

4000

5000

6000

7000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Bo

ilin

g P

oin

t

Melting Point

92 Naturally Occurring Elements

44

Page 45: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

where is the real discovery?

Sometimes we are tempted to think that

outliers are just noise or

natural variance.

0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

45

Page 46: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

where is the real discovery?

Sometimes we are tempted to think that

outliers are just noise or

natural variance.

0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

46

Page 47: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

where is the real discovery?

Sometimes we are tempted to think that

outliers are just noise or

natural variance.

0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

47

Page 48: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

Add some context to the data!

…that diagonal line in

the plot (where melting

point = boiling point)

… this provides some

context (related to your

prior knowledge)!0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

48

Page 49: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

Add some context to the data!

…that diagonal line in

the plot (where melting

point = boiling point)

… this provides some

context (related to your

prior knowledge)!0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

49

Page 50: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers:

What is that point below the line?

…that diagonal line in

the plot (where melting

point = boiling point)

… this provides some

context (related to your

prior knowledge)!0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

50

Page 51: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers: there’s the

real discovery!

Arsenic:Melts @ 1089

oK

Boils @ 889oK

0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

51

Page 52: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Trend Line and Outliers: there’s the

real discovery!

Arsenic:Melts @ 1089

oK

Boils @ 889oK

Arsenic! 0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000

Bo

ilin

g P

oin

t

Melting Point

Boiling Points and Melting Points

of the 92 Chemical Elements

52

Novelty Discovery!

Page 53: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment53

3

Page 54: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

(Graphic by Cray, for Cray Graph Engine CGE)

http://www.cray.com/products/analytics/cray-graph-engine

“All the World is a Graph” – Shakespeare?

The natural data structure of the world is not rows and columns, but a Graph!

The Human Connectome Project: mapping and linking the major pathways in the brain.http://www.humanconnectomeproject.org

54

Page 55: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Producing A Smarter Data Narrative• It is best when we understand our data’s context and meaning…

• … the Semantics! This is based on Ontologies.

• My students memorized the definition of an Ontology…

–“is_a formal, explicit specification of a shared conceptualization.”from Tom Gruber (Stanford)

• Semantic “facts” can be expressed in a database as RDF triples:

{subject, predicate, object} = {noun, verb, noun}

55

Page 56: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Simple Example of the Power of Graph:

Semi-Metric Space

• Entity {1} is linked to Entity {2} (small distance A)

• Entity {2} is linked to Entity {3} (small distance B)

• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)

• Similarity Distances between A, B, and C violate the triangle inequality!

{1} {3}{2}

56

Page 57: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

• Entity {1} is linked to Entity {2} (small distance A)

• Entity {2} is linked to Entity {3} (small distance B)

• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)

• Similarity Distances between A, B, and C violate the triangle inequality!

• The connection between black hat entities {1} and {3} never appears explicitly

within a transactional database.

• Examples: (a) Medical Research Discoveries across disconnected journals,

through linked semantic assertions; (b) Customer Journey modeling; (c) Safety

Incident Causal Factor Analysis; (d) Marketing Attribution Analysis; (e) Fraud

networks, Illegal goods trafficking networks, Money-Laundering networks.

{1} {3}{2}

Simple Example of the Power of Graph:

Semi-Metric Space

57

Page 58: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

analytics.gmu.eduCDDA Spring 2014 Workshop

Research Example: Literature-Based Discovery (LBD)

58

References:• https://www.sciencedirect.com/science/article/pii/S0950705116303860• https://summerofhpc.prace-ri.eu/introducing-lbdream-and-literature-based-discovery/

Page 59: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

analytics.gmu.eduCDDA Spring 2014 Workshop

Research Example: Discovery in the

NIH-NLM Semantic MEDLINE Database

Project Description: Conduct semantic graph mining of the NIH-NLM metadata repository from ~26 million medical research articles.

Graph Database: ~90 million RDF triples (predications; semantic assertions).

Research Project: (PhD dissertation at GMU) Novel subgraph discovery; Context-based

discovery; New concept emergence in medical

research; Story discovery in linked graph network; and Hidden knowledge discovery through semi-metrics.

59https://skr3.nlm.nih.gov/SemMedDB/

Page 60: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment60

4

Page 61: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Clustering = the process of partitioning a set of data into subsets

(segments or clusters) such that a data element belonging to any

chosen cluster is more similar to data elements belonging to

that cluster than to data elements belonging to other clusters.

= Group together similar items + separate the dissimilar items

= Identify similar characteristics, patterns, or behaviors

among subsets of the data elements.

Challenge #1) No prior knowledge of the number of clusters.

#2) No prior knowledge of semantic meaning of the clusters.

#3) Different clusters are possible from the same data set!

#4) Different clusters are possible using different similarity metrics.61

Page 62: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

62

Types of Clustering

In general terms, there are two approaches to clustering:

Partitional – One set of clusters is created (e.g., K-Means clustering – choose K, the number

of clusters)

Hierarchical – Nested sets of clusters are created sequentially.

62

Page 63: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Example of Hierarchical Clustering

63

Starting with (a),then going to (e):Bottom-up,AgglomerativeClustering Starting with (e),

then going to (a):Top-down,Divisive Clustering

The “Google Maps” view for your Data Space

63

Page 64: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Hierarchical Clustering Methods

Clusters are created at multiple levels –creating a new set of clusters at each level.

There are 2 types of hierarchical clustering:

Agglomerative Clustering Bottom-Up

Initially, each item is in its own cluster.

Then, clusters are merged together iteratively ...

... based upon similarity of data items.

Divisive Clustering Top-Down

Initially, all items are in one cluster.

Then, large clusters are successively divided ...

... based upon distance between data items.

Segmentation of One =

‘SegOne’ Marketing

Marketing CampaignSegments =Customer Personas

64

Page 65: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Cluster Evaluation

65

Page 66: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

66

Remember that Clustering has lots of issues!

The number of clusters is not known

There might not exist a “correct” number of clusters

Results depend on which attributes are selected

Results depend on the choice of distance/similarity metric

Therefore, there is no “correct” set of clusters.

So, how do you know what is a good set of clusters?

66

Page 67: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

How to know if your clusters are good enough

Reference: http://www.biomedcentral.com/content/supplementary/1471-2105-9-90-S2.pdf

You know the clusters are good …

… if the clusters are compact relative to their separation

… if the clusters are well separated from one another

… the “within cluster” errors are small (low variance within)

… if the number of clusters is small relative to the number of data points

Various measures of cluster compactness exist, including the Dunn index , C-index, and the DBI (Davies-Bouldin Index)

67

Page 68: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Application of Davies-Bouldin Index

Assume K (the number of clusters) and assume other things (choice of clustering algorithm; the

choice of clustering feature attributes; etc.)

Measure DBI

Test another set of values for the cluster input parameters (K, feature attributes, etc.)

Measure DBI

… continue iterating like this until you find the set of

cluster input parameters that yields the best (minimum) value for DBI.

68

Page 69: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Scientific Discovery from

Cluster Analysis of data

parameters from events on

the Sun and around the Earth

Page 70: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Cluster Analysis:Find the clusters, then Evaluate them

D- B

Ind

ex

Delay (hr) of Dst from Vsw and Bz

DBI for Dst_Vsw_Bz

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

0 1 2 3 4 5 6 7 8 9 10 11 12

Time Shift

DB

I

2C DBI

3C DBI

4C DBI

Average

Figure 10. Davies-Bouldin index for various time delays of Dst from Vsw and Bz for cases of 2 (blue), 3 (red), 4 (yellow) clusters, and the overall average (purple), indicating an optimal delay of

~2-3 hours for Dst.

Good Clusters =

Small Size relative to

Cluster Separation.

DISCOVERY! ...

Solar wind events

have the strongest

association (i.e., the

tightest clusters) with

the space plasma

events within the

Earth’s magnetosphere

about 2-4 hours after

a major plasma outburst

occurs on the Sun.

70

Page 71: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment71

5

Page 72: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

4 Examples of Big Data Association Mining:

The goal of Rec Sys algorithms is Diversity!

72

Page 73: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Classic Textbook Example of Data Mining (Legend?): Data

mining of grocery store logs indicated that men who buy

diapers also tend to buy beer at the same time.

Example #1

73

Page 74: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Amazon.com mines its customers’ purchase logs to

recommend books to you: “People who bought this book also

bought this other one.”

Example #2

74

Page 75: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Netflix mines its video rental history database to recommend

rentals to you based upon other customers who rented similar

movies as you.

Example #3

75

Page 76: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Wal-Mart studied product sales in their Florida stores in 2004

when several hurricanes passed through Florida.

Wal-Mart found that, before the hurricanes arrived, people

purchased 7 times as many of {one particular product}

compared to everything else.

Example #4

76

Page 77: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Wal-Mart studied product sales in their Florida stores in 2004

when several hurricanes passed through Florida.

Wal-Mart found that, before the hurricanes arrived, people

purchased 7 times as many strawberry pop tarts compared

to everything else.

Example #4

77

Page 78: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Strawberry pop tarts???

http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.htmlhttp://www.hurricaneville.com/pop_tarts.html

http://bit.ly/1gHZddA78

Page 79: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Association Rule Mining forHurricane Intensification Prediction

• Research by GMU geoscientists

• Predict the final strength of hurricane at landfall.

• Find co-occurrence of final hurricane strength with specific values of measured physical properties of the hurricane while it is still over the ocean.

• Result: the association mining model prediction is better than National Hurricane Center prediction!

• Research Paper by GMU scientists: https://ams.confex.com/ams/pdfpapers/84949.pdf

79

Page 80: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

A Statistical Data Puzzle

80

Page 82: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Solution to the Island of Games Puzzle

82

Page 83: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Island of Games Color-coding the player ratings data distribution

Green and Red = High Rubik’s Cube ranking (> 0.5) ; Blue and Yellow = Low Rubik’s Cube ranking (< 0.5)

The intrinsic patterns in the player ratings data are NOT

revealed in 2-D scatter plots or by using traditional statistical

methods.

83

Page 84: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Island of Games Color-coding the player ratings data distribution

Green and Red =High Rubik’s Cube ranking (> 0.5) ; Blue and Yellow =Low Rubik’s Cube ranking (< 0.5)

The intrinsic patterns in the player ratings data are not revealed

in 2-D scatter plots or by using traditional statistical methods.

Finally, exploration in the 3-D input parameter space (of player

ratings for 3 games) reveals the actual player groupings…84

Page 85: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

3-D view of the Player Ratings DataThe true 3-D data distribution = 4 separable groups!

http://www.datasciencecentral.com/profiles/blogs/island-of-games-puzzle-problem-statement

Data Visualization Revelations85

Page 86: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Solving the Island of Games Puzzle

with a sequence of cluster models

Reference: Dr. Joseph Marr, GMU 86

Page 87: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

1-Dimensional Exploration –statistical dependence tests :

all outcomes are equally likely (50%)

x=H y=H z=H

495 524 521

P(x=H) P(y=H) P(z=H)

0.495 0.524 0.521

x=L y=L z=L

505 476 479

P(x=L) P(y=L) P(z=L)

0.505 0.476 0.479

87

Page 88: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

2-Dimensional Exploration –statistical dependence tests :

pair-wise associations are also random

PAIRS

1 P(x=H,y=H) 0.270 LIFT 7 P(x=L,z=H) 0.251 LIFT

P(x=H)P(y=H) 0.259 1.041 P(x=L)P(z=H) 0.263 0.954

2 P(x=H,y=L) 0.225 LIFT 8 P(x=L,z=L) 0.254 LIFT

P(x=H)P(y=L) 0.236 0.955 P(x=L)P(z=L) 0.242 1.050

3 P(x=L,y=H) 0.254 LIFT 9 P(y=H,z=H) 0.270 LIFT

P(x=L)P(y=H) 0.265 0.960 P(y=H)P(z=H) 0.273 0.989

4 P(x=L,y=L) 0.251 LIFT 10 P(y=H,z=L) 0.254 LIFT

P(x=L)P(y=L) 0.240 1.044 P(y=H)P(z=L) 0.251 1.012

5 P(x=H,z=H) 0.270 LIFT 11 P(y=L,z=H) 0.251 LIFT

P(x=H)P(z=H) 0.258 1.047 P(y=L)P(z=H) 0.248 1.012

6 P(x=H,z=L) 0.225 LIFT 12 P(y=L,z=L) 0.225 LIFT

P(x=H)P(z=L) 0.237 0.949 P(y=L)P(z=L) 0.228 0.987

88

Page 89: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

3-Dimensional Exploration –statistical dependence tests :

3-way combinations are gold!!!TRIPLES

1 P(x=H,y=H,z=H) 0.270 LIFT

P(x=H)P(y=H)P(z=H) 0.135 1.998

2 P(x=H,y=H,z=L) 0.000 LIFT

P(x=H)P(y=H)P(z=L) 0.124 0.000

3 P(x=H,y=L,z=H) 0.000 LIFT

P(x=H)P(y=L)P(z=H) 0.123 0.000

4 P(x=H,y=L,z=L) 0.225 LIFT

P(x=H)P(y=L)P(z=L) 0.113 1.994

5 P(x=L,y=H,z=H) 0.000 LIFT

P(x=L)P(y=H)P(z=H) 0.138 0.000

6 P(x=L,y=H,z=L) 0.254 LIFT

P(x=L)P(y=H)P(z=L) 0.127 2.004

7 P(x=L,y=L,z=H) 0.251 LIFT

P(x=L)P(y=L)P(z=H) 0.125 2.004

.

8 P(x=L,y=L,z=L) 0.000 LIFT

P(x=L)P(y=L)P(z=L) 0.115 0.00089

Association Mining

also discovers voids

(anti-Associations =XOR relationships)

Page 90: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment90

6

Page 91: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

All of the features in the data histogramconvey valuable (actionable) information (the long tail, outliers, multi-modal peaks, …)

0

2000

4000

6000

8000

10000

12000

14000

-8 -6 -4 -2 0 2 4 6 8

91

Page 92: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Mixture Models = Statistical Clustering Each of these data histograms

can be represented by the mixture (i.e., sum) of several Gaussian normal distributions,

such as the 3 Gaussian distributions shown in the

lower right.

Each Gaussian statistically represents (characterizes) one “cluster” of data values within

the full set of data values.

Comprehensive web resource for Mixture Models for clustering and unsupervised learning in Data Mining:

http://www.csse.monash.edu.au/~dld/mixture.modelling.page.html

92

Page 93: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Statistical Clustering tags (characterizes) the data, enabling discovery: making the data “smart”!

93

Each Gaussian in the mixture can be characterized by various parameters, such as the mean, variance (standard deviation), and amplitude (i.e., the strength of that particular Gaussian component within the mixture).

These parameters can be plotted as a function of some independent (treatment) variable, to discover trends and correlations in the effects across the different segments of the population. h

ttp

s://

ww

w.r

ese

arch

gat

e.n

et/p

ub

lica

tio

n/6

20

02

24

_Co

nfo

rmat

ion

al_

entr

op

y_i

n_

mo

lecu

lar_

reco

gnit

ion

_by

_pro

tein

s

Page 94: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment94

7

Page 95: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bayes Theorem

• Bayes Theorem…

• Naïve Bayes assumption95

http://www.datasciencecentral.com/profiles/blogs/6-easy-steps-to-learn-naive-bayes-algorithm-with-code-in-python

Page 96: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bayes Theorem

• Bayes Theorem… now with Legos

96

https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego

Page 97: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bayes Theorem• … for missing value imputation

• Bad idea: inserting estimated values of missing data elements = Data Creation!

• Better idea: predicting a value that is not knowable in advance = Predictive Analytics!

97

Page 98: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bayes Belief Networks• … for missing value imputation … Example:

• Use all conditional probabilities across all database attributes to predict the missing value.

98

Page 99: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Bayes Belief Networks for Cosmology• …for missing value imputation: Galaxy Redshifts

• The problem: less than 0.1% of catalogued galaxies have a measured redshift = Distance!

• Bigger sky surveys are coming in the next 10 yrs

• Less than 0.001% of galaxies will have distance estimate!!

• Traditional method: use colors of galaxies (red-shift…)

• BBN method: use all properties of galaxies (shape, size, color, texture, concentration,…)

• Result: generate a probability distribution curve of the redshift (distance estimate) for each and every galaxy.

• Consequence: build a mass distribution map of the Universe!99

Page 100: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Machine Learning Algorithms

1) Principal Component Analysis (PCA)

2) Regression Analysis

3) Graph Mining (Network Analysis)

4) Clustering and Validation

5) Association Mining (Link Analysis)

6) Statistical Clustering

7) Bayesian Belief Networks

8) My Data Science Career ‘Aha!’ moment100

8

Page 101: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

My Data Science Career “Aha!” moment

101

6Data

Mining

what?

http://dssresources.com/cases/DSScatalog.html#ADSCOUThttp://www.moneyballmarketer.com/blog/2017/12/28/data-analytics-lessons-from-the-nbas-first-data-mining-product-from-ibm-1

Page 102: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

OUTLINE

• Goals of Machine Learning Applications

• Some Machine Learning Algorithms

• Some Atypical Applications

• Final Thoughts

102

@KirkDBorne

Page 103: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

In the Big Data era, Everything is Quantified and Monitored :– Populations & Persons– Smart Cities, Energy, Grids, Farms, Highways– Environmental Sensors– IoE = Internet of Everything!

Discovery through Machine Learning and Data Science:– Class Discovery, Correlation Discovery,

Novelty Discovery, and – Association Discovery: Find interesting

cases where condition X is associated with event Y with time shift Z.

17 SDGs are KPIs for the World!

(currently, the SDGs have 229

Key Performance Indicators)( SDG: Sustainability Development Goal )

Big Data + the IoT + Citizen Data Scientists =

= Partners in SustainabilityThe Internet of Things (IoT):Knowing the knowable via deep, wide, and fast data from ubiquitous sensors!

Big Data:

Sustainability Development Goals

http://www.unglobalpulse.o rg 103

Page 104: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

104Booz | Allen | Hamilton

@KirkDBorne

@BoozDataScience

LISTEN

READ, BUILD, and EXPLOREwww.boozallen.com/datascience

Tips for Building a Data Science Capability The Mathematical Corporation 10 Signs of Data Science Maturity

The Field Guide to Data Science The Data and Analytics Catalyst

Explore: sailfish.boozallen.com

Booz | Al len | Hamilton

PARTICIPATEdatasciencebowl.com

…Learn how AI and Machine Intelligence empower The Mathematical Corporation

in MachineIntelligence

These slides are here: http://www.kirkborne.net/ODSC2018/

THANK YOU!Check out some of these resources…

Page 105: OPEN DATA @ODSC SCIENCE CONFERENCE · Some Quick Definitions •Machine Learning (ML) = mathematical algorithms that learn from experience (= pattern recognition from previous evidence)

Thank you!Contact information, for further questions or inquiries:

Dr. Kirk Borne

Principal Data Scientist, Booz Allen Hamilton

Twitter: @KirkDBorne or Email: [email protected]

105Booz | Allen | Hamilton

These slides are here: http://www.kirkborne.net/ODSC2018/

ODSC Conference – Boston – May 2018