71
#RoadToBigData

#RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

#RoadToBigData

Page 2: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Roberto Falcinelli Business Analytics Sales Consulting Senior Manager Raffaele Corti Business Analytics Principal Sales Consultant

How to become a Data Driven Company Maximizing the Data Capital through an end-to-end Analytic Journey

Nino Guarnacci Paolo Piccioni Cristian Spigariol Roberto Falcinelli

Road to Big Data From Analytics Big Bang to Cloud Revolution

Meet the Oracle Experts

Page 3: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

How to become a Data Driven Company

1. Mindset Shift: Data is not just an issue to manage but the cornerstone to build your strategy

2. Organize Data: A data strategy relying on data silos can’t be afforded anymore

3. Everyone is a Data Analyst: Each one in the company should propose his/her own insight

4. Experiment with Data: Innovation on products and services starts with Data Labs

Page 4: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 4

Page 5: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 5

Nino Guarnacci

Page 6: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Digital Contamination

6

+ +

Page 7: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Digital Trends

+ + Internet of Things Big-Data

Machine Learning Chatbot

Page 8: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Oracle Cloud Technology Enablers

8

Connect, Provision, Secure 1

4

Big Data Machine Learning

Correlate, Aggregate, Geo-Fence, Act 2 3

5 Use Insights, Apply models in real-time, Sharing common knowledge, Interact

Storage Cloud

CHATBOT MOBILE

Bettina 10:09

SOLE

ACQUA

LIVORE

Cleansing, Wrangling, Store

Discover, Understand, Training Models

fast

Page 9: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Oracle Cloud Technology Enablers

9

Connect, Provision, Secure 1

4

Big Data Machine Learning

Correlate, Aggregate, Geo-Fence, Act 2 3

5 Use Insights, Apply models in real-time, Sharing common knowledge, Interact

Storage Cloud

CHATBOT MOBILE

Bettina 10:09

SOLE

ACQUA

LIVORE

Cleansing, Wrangling, Store

Discover, Understand, Training Models

fast

Page 10: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Produce Tons of Data

Understand fail event

Machine working values for single threshold

Monitor goods location

Unlock the value of Digitalization through Stream & Advanced Analytics

With more Efficiency Analyzing merged data stream Discover Anomalies

With more Productivity Minimize lines down-time Predicting Events

With more Flexibility Tracking assets & processing Higher utilization & control

Industry 4.0 with IoT & Advanced Analytics

Page 11: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

BigData

ERP MES

Grande Roberta

Real- Time

• Are there patterns of events that cause the equipment to fail?

• What are the top factors / influencers that affect product yield?

• What’s the downstream impact of yield change or defective parts?

• Is there a correlation between machine parameters and product quantity?

• Can we predict the likelihood of certain product defects?

• Are there assets used improperly or in caution areas

Industry 4.0 with IoT & Advanced Analytics

Page 12: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Grande Roberta

Real- Time

• Is there a correlation between machine parameters and product quantity?

{ "PLANT": "A23", "MACHINE": 34, "PMIN": 110, "ODEV": 0.7, "TEMP": 38, "PRES": 3.4, "STATUS": "PAUSE", "LAT": 14.453, "LON": 42.673 }

120 msg/sec telemetry events X 6 machine sent 7 parameters, one of those is covariance params indicator

Real-Time clustering events through covariance and quantity discovering low-density cluster target Merge Real-Time & ERP Data

Industry 4.0 with IoT & Advanced Analytics

Page 13: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Grande Roberta

Real- Time

• Is there a correlation between machine parameters and product quantity?

{ "PLANT": "A23", "MACHINE": 34, "PMIN": 110, "ODEV": 0.7, "TEMP": 38, "PRES": 3.4, "STATUS": "PAUSE", "LAT": 14.453, "LON": 42.673 }

120 msg/sec telemetry events X 6 machine sent 7 parameters, one of those is covariance params indicator

Real-Time clustering events through covariance and quantity discovering low-density cluster target Merge Real-Time & ERP Data

Industry 4.0 with IoT & Advanced Analytics

Page 14: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 14

Paolo Piccioni

Page 15: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Data Lab Where you can even find things you were not looking for

15

Page 16: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 16 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Potential savings identified

Data Lab To Find Savings and Cost Reductions in Health Care Budget

• United Kingdom’s National Health Service

• Identify billing and identity fraud

• Optimize treatment by reducing use of less effective medical procedures

$156M

“With one vendor providing the whole solution, it’s very easy for us.” - Nina Monckton, NHS BSA

Page 17: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Data Lab Fuel Enterprise Innovation

17

identify new customer segments

predict maintenance activities

detect fraudulent activity

better manage risk

create innovative products and services

optimize pricing

The data lab enables organizations to think and act like startups

Page 18: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Data Lab Foundation

Data Lake (all data store)

Easier, Visual,

Faster, Powerful,

Self-Service

Data Visualization

Provides a broad range

of ML algorithms based

on open source, market

leading technologies Machine Learning

The Lab Core: explores

available data and their

relationships, transforms

data on-the-fly, discovers

hidden patterns

Data Discovery

Secures data, provides

Access Control, profiles

users according to their

roles Security

Page 19: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19

Maintenance Insight

Starring Operations Analyst Data Scientist

Data Lab Demo Story

Page 20: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Failure Likelihood

Assets on map

On this Dashboard an Operations Analyst can monitor assets on a map and keep “Failure Likelihood “ under control. Spot problems befere they happens means identify revenue at risk . How this can be done? Who provided this information and how?

Operation Analyst

Page 21: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Search for data

The Analyst starts to talk with the data. He has access to the full catalog of Datasets available and can browse and search it as easily as shopping online. He starts searching for Sensor Data, finding a good starting point: a recent sensor readings dataset.

Operation Analyst

Start with the data. Here are the datasets

Page 22: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Sensor reading: power

Sensor reading:

temperature

Sensor reading: pressure

1.2M recs dataset

In the dataset he sees all the attributes lined up graphically and notice in this 1.2M records dataset, that power, temperature and pressure can be zero, but this does not tell that’s an ACTUAL failure; it can be just a point in time

Operation Analyst

Page 23: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Maintenance records file

Safety incident records file

Workorders file Sensors Readings

Using the sensor data set as starting point is good, but, to really figure out when failures happened , we’re going to use a whole bunch of other data sets. To investigate whether a past sensor reading was a failure for the data, we need to build and train a predictive model, looking for similar patterns in the data. That’s why we need to read and sift through maintenance records, safety incident records, and work orders datasets.

Operation Analyst

Page 24: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Group continuous data into 4

buckets for easier searching and

filtering

Looking at the raw data sets, we realize that to improve them, we need to transform: for instance, we group the different amperage readings into buckets, because that’s a better way to filter and find bad equipment s – equipment that fail would show no, or really low, power. Now we can start build a Discovery Dashboard.....

Operation Analyst

Page 25: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Need to filter down 39k notifications

Refine searching for new “power_off” bucket

…the enriched data is immediately available in my discovery dashboard and Because I have this new bucket for power ranges, I’m going to go from 39K notifications to a more reasonable (and readable) number

Operation Analyst

Page 26: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

I’m down to 8 records.

Select the equipment with most power off

The first filter is on “power_off”. The second is to pick the equipment with the most “power_off” readings, which I did from the guided navigation. Now I can read the maintenance records and pick which were the real failures for this equipment , then go to the next and so on.

Operation Analyst

Page 27: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Read the text of the notes

See the related incidents

And see the related incidents, read the text of the notes and logs, and use my experience in the field to determine whether something was really a failure, and finally flag the failures in the sensor data, thanks to transform capabilities. Then I share new, cleaned data, with my Data Scientist friend

Data Scientist

Operation Analyst

Page 28: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team, writes the model in languages called Python or R, both popular programming languages for statistics. The point here is the ease of collaboration and sharing data between analysts and data scientists all working together.

Data Scientists

Of the 15 months of historical data, the data scientist used

12 months to “train” the model and then the last 3 months of

data to test it.

Page 29: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

The non-data scientists will find it much simpler to understand the results of the model by looking at its output on a chart, and give confidence in the results. Historical data compared to what the model says would have happened. They match well and we can trust the model.

Operation Analyst

Data Scientist

Actual Equipment Failure Data

Predicted Equipment Failure Data

Page 30: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Then the model is operationalized and your dashboards updated. The right people get alerts in real time and take action. Dashboards show predictions in reports, and data is there for operational discovery.

Operation Analyst

And he can continue working in the Data Lab on more and better models, predicting inventory, scheduling workers, weather impacts, and performance. It becomes essential to your work.

Data Scientist

Page 31: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Discovery: A Modern Data Lab for Everyone

catalog

transform

discover

collaborate

predict

explore

Page 32: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

BDA Cluster

BigData + Big Data Discovery = Enabled Data Lab

• BDD Data Processing - Sampling - Profiling - Auto-Enrichment - Cataloging - Data Set Transforms - Data Set Enrichments

• Big Data Discovery Web Studio • Big Data Discovery Dgraph

BD

D N

od

e(s

)

Had

oo

p D

ata

No

de

s

• Cloudera Enterprise • Oracle Big Data

Connectors (includes Oracle RAAH)

• Oracle Big Data Spatial and Graph

• Oracle Data Integrator Enterprise Edition

• Oracle Big Data SQL (add-on)

on YARN

Oracle Big Data Oracle Big Data Discovery

Had

oo

p N

od

es

Page 33: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

From Data Lab to Process

34

Mainstream

Lab

Collect source

data and explore their contents

Select and

prepare data for exploitation

Experiment on data

through advanced analytics

Bring the value into production

Transform workplace

and workforce through

insights Consumers

Experts

Data Scientists

Experts

Page 34: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Advanced Analytics Approach

35

“Data Driven Research” reasoning from the data to the

general theory

Machine Learning on the Process

...

Data Discovery in the Lab

Source data are initially explored to find out hidden relationships. This is the basis for picking up relevant

features to feed prediction models ( “features engineering”).

Induction

Data Scientists

Experts

Advanced Analytics in the Mainstream

The final step is to run ML models as well as new patterns in the mainstream, make their outcome

available for the broad users community through Data Visualization and Business Intelligence (i. e. historical

or Current Data to be “scored” for predictions).

Consumers

Machine Learning in the Lab

When the data context has been outlined and most relevant features identified, then ML models can be

built and evaluated over historical and new (lab) data. Data Scientists

Page 35: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 36

Cristian Spigariol

Page 36: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Types of uses of Machine Learning in all Industries Typical use case scenarios

Classification (predict among a set of options)

• Find and preventing customer churn

• Target the right customer with the right offer

• Predict customer response to an affinity card program

Regression (estimate a missing value)

• predict how much a customer will spend

Clustering (find unknown patterns)

• Detect anomalous or suspicious activities

Association Rules (find correlations)

• Predict correlation among items

Graph Analysis (understand interactions)

• Understanding influencers in social networks

Page 37: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Machine Learning through Graphs

• Graphs systems focus on relationships rather than entities

– They are key to understand highly connected systems and relative behaviours (i. e. areas of strong/weak interaction) by examining how relationships spread throughout the graph

• Graphs algorithms are self-consistent

– The answer to complex problems resides in how entities (nodes) interact and not in the entities themself or in external resources

– Graph algorithms are effective even with graphs based on entities with few properties

• Cover a broad range of applications

– Their simple and flexible data model is able to describe a broad range of use cases, from financial systems, human neural networks to infrastructural networks (transportation, telco, electricity)

Page 38: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Marketing Analyses using Graphs

Graph algorithms can strongly improve the effectiveness of marketing analyses.

In customer profiling we can extend the individual profile of a given customer by considering his/her ability to influence the circle of friends

In marketing campaigns the identification of influencers can amplify the echo of the relative promotional activities and increase the conversion rate

In marketing campaigns the identification of strongly connected communities (people who interact on the basis of shared behaviors) can be the the basis for customers segmentation.

Page 39: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Network Analysis using Graphs

Detection of weak links is aimed at identifying nodes in the transportation/telco/energy network that have a high numbers of flows that come through them and that are not balanced with a proper number of alternative paths (Betweenness centrality).

Graphs algorithms are extremely useful to optimize network.

Network flows analysis consists in assigning to each connection (i. e. link between two nodes) a capacity and evaluate the total amount of flows that passes on it. The amount of flows on an edge cannot exceed the capacity of the edge.

Page 40: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidentihal – Internal 41

Collaborative Filtering Pattern

Find out similarities Select

potential targets

Rank output by relevance

If a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person (Wikipedia).

Find out people that present the same behavior with respect to the person A.

Person A

Select the items chosen by the people similar to the person A (i. e. potential targets).

Among the potential targets weight the items that present the highest relevance rank.

Higher Rank

Page 41: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

Ricky

Simon

Lucia

Circle of trust

Ricky 0.4

Simon 0.3

Lucia 0.1

John Maria

...

...

...

By using a centrality algorithm (Personalized PageRank) we can determine the most influent people in the circle of connections originated by Alice.

We move from similarities to trust! Alice

Page 42: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

Prod#3

Ricky

Simon

Lucia

Circle of trust Targets

Prod#4

Prod#9

Prod#7

We determine the potential targets by selecting the products already boughts by the trusted people (bipartite graph).

Page 43: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

Prod#3 (2)

Ricky

Simon

Lucia

Circle of trust Targets

Prod#4 (1)

Prod#9 (1)

Prod#7 (2)

We start the relevance algorithm (salsa) by measuring the relevance score, that is the sum of the preferences received by each product.

Page 44: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

Ricky (2)

Simon (4)

Lucia (4)

Circle of trust Targets

We then walk connections back-to-front to measure Hub Score as the sum of the relevance ranks of products bought

Measure the ability of each trusted person to intercept the tastes of the circle

Prod#3 (2)

Prod#4 (1)

Prod#9 (1)

Prod#7 (2)

Page 45: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

Prod#3 (6)

Circle of trust Targets

Prod#4 (4)

Prod#7 (8)

Prod#9 (4)

The new relevance score is measured with the weighted

sum of the preferences (hub ranks) received.

Ricky (2)

Simon (4)

Lucia (4)

The Prod#7 has the highest likelihood to be well-accepted by

Alice since it has been chosen by the most

“knowledgeable” trusted people

Alice

Page 46: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Recommendation using Graphs Analytical Algorithms

The algorithm can be iterated many times. Each iteration will

reinforce the rank score and the relevance score.

The higher the number of iterations the higher the effectiveness of the algorithm

Page 47: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graphs Analytical Algorithms – Possible use cases

This recommendation approach is at the basis of the WTF service at Twitter. It can be proficiently be applied to different industries as for example:

to recommend insurance policies based on the most relevant opinions of “trusted“ people

to up-sell telco services with the same trust+expertise approach.

Page 48: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Do it yourself? If so please consider...

Complexity Productivity

Architecture Integration

Not trivial algorithms, need domain specific knowledge

Bug Fixing, Tuning for precision and performance, Support

Graph algorithms need in-memory parallel execution as well as a low-

latency NoSQL storage

You need to integrate your solution with the Big Data cluster to feed

your Graph database

Page 49: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

A rich set of built-in, parallel algorithms Parallel graph mutation operations

Detecting Components and Communities

Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Spasification

Ranking and Walking Pagerank, Personalized

Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants)

Evaluating Community Structures

∑ ∑

Conductance, Modularity Clustering Coefficient (Triangle Counting)

Path-Finding

Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s

Link Prediction SALSA

(Twitter’s Who-to-follow)

Other Classics Vertex Cover

a

d

b e

g

c i

f

h

The original graph a

d

b e

g

c i

f

h

Undirected Graph

Simplify Graph

a

d

b e

g

c i

f

h

Left Set: “a,b,e”

a d

b

e

g

c

i

Bipartite

Graph

g e b d i a f c h

Sort-By-Degree (Renumbering)

Filtered Subgraph

d

b

g

i

e

54

Oracle Big Data Spatial and Graph – Memory Analyst

Page 50: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Machine Learning is Data Driven

• Data is the fuel of ML algorithms

• The effectiveness of ML algorithms is strictly tied to the amount of available

data

• To translate ML results into a competitive advantage we need a paradigm shift in the way information management solution are designed and managed.

Page 51: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal 56

A Data Driven Strategy with Machine Learning

Adopt Standards

Don’t Move data

Broadcast ML results

Experiment and Act

New Paradigm

• Data is heavy – don’t move data

• Move elaboration to data instead

• Reduce the complexity

• Facilitate integration

• Speak the language of Data Scientists (R, Python, Scala, Spark, Gremlin)

• Take advantage of new ML packages release (e. g. CRAN, MLlib)

• Define your models (Lab) and then move them in the mainstream (Prod)

• Score your models continuoulsly (both in batch and in streming)

• Take them up to date

• Spread ML results thorughout the user communities

• Predictions are new inputs for in-place processes or analyses (additional KPIs, properties, etc..)

Page 52: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 57

Roberto Falcinelli

Page 53: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

To us, data is an asset and an heritage:

We just need to find the right way to "look inside",

to see what you normally do not see.

Page 54: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Driven Investigation: a powerful story

Oracle & Trenitalia Confidential 59

Page 55: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 60

Data Driven Investigation: a powerful story

Page 56: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Understanding Data visualization

• It's the study of how to represent data by using a visual approach rather than the traditional reporting method • It is a visual way of telling a “story”

(*) Antoine De Saint Exupery - Le Petit Prince – Chapitre 1

Page 57: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Difference between Data Viz and Infographics

• Infographics: • is usually static • is artful • is less data, more conclusions

Page 58: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Difference between Data Viz and Infographics • Data Viz:

• gives user the right info • is fully Interactive and is scalable • is visually appealing • it works on any device • advanced analytics at your fingertips

Page 59: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Telling a Data Driven Analytics Story

Page 60: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

«conversare» con i dati:

65

Fenomeno = Spese in UK fuori controllo

Page 61: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 66

Due Problemi: 1. Spese di viaggio con un picco preoccupante 2. Spese per Stipendi in crescita costante

Page 62: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 67

Spese di Viaggio: Troppe spese per Hotel Out of Policy in Luglio / Agosto

Page 63: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 68

Stipendi: Aumentano le spese per straordinari vs stipendio base >> ma perche??

Page 64: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 69

Alto turnover al Call Center nel mese di Luglio ... per motivi soprattutto economici

Page 65: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 70

Fra i dipendenti dimissionari, si possono individuare dei cluster e delle correlazioni fra centri di costo e le ragioni per le dimissioni

Page 66: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Visualization Capabilities

Advanced Analytics

Any device Ask & Search

Confidential – Oracle Internal/Restricted/Highly Restricted 71

Data MashUp & Discovery

Page 67: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Introducing

Day by Day

A new app from the BI Mobile labs that will learn what users are

interested in, when & where they are interested in it and who

they collaborate with

Page 68: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Introducing

Synopsis Visual, interactive and Intuitive

Works in-line with the apps you know and love

Start analyzing directly from email and don’t just

look at your data… Understands it

Go to Apple Store OR Play Store and look for: «Oracle Synopsis»

Page 69: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DATA SEE & DETECT

CONNECT & PREPARE MODEL &

BUILD

DEPLOY & SCALE

LEARN & SHARE

75

Every one can contribute to:

Find Hidden Patterns

Build Collective Intelligence

Liberate All Data

Create Agile Enterprise

Adapt to Your Needs

• Can you do it just using Excel ?

Page 70: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

ANKI Overdrive Oracle Cloud Demo Oracle Cloud Integrated Applications and Platform Services showcased in a real racing car Demo

Road to Big Data From Analytics Big Bang to Cloud Revolution

Page 71: #RoadToBigData€¦ · Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team,

#RoadToBigData