View
2
Download
0
Category
Preview:
Citation preview
#RoadToBigData
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Roberto Falcinelli Business Analytics Sales Consulting Senior Manager Raffaele Corti Business Analytics Principal Sales Consultant
How to become a Data Driven Company Maximizing the Data Capital through an end-to-end Analytic Journey
Nino Guarnacci Paolo Piccioni Cristian Spigariol Roberto Falcinelli
Road to Big Data From Analytics Big Bang to Cloud Revolution
Meet the Oracle Experts
How to become a Data Driven Company
1. Mindset Shift: Data is not just an issue to manage but the cornerstone to build your strategy
2. Organize Data: A data strategy relying on data silos can’t be afforded anymore
3. Everyone is a Data Analyst: Each one in the company should propose his/her own insight
4. Experiment with Data: Innovation on products and services starts with Data Labs
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 4
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 5
Nino Guarnacci
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Digital Contamination
6
+ +
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Digital Trends
+ + Internet of Things Big-Data
Machine Learning Chatbot
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Oracle Cloud Technology Enablers
8
Connect, Provision, Secure 1
4
Big Data Machine Learning
Correlate, Aggregate, Geo-Fence, Act 2 3
5 Use Insights, Apply models in real-time, Sharing common knowledge, Interact
Storage Cloud
CHATBOT MOBILE
Bettina 10:09
SOLE
ACQUA
LIVORE
Cleansing, Wrangling, Store
Discover, Understand, Training Models
fast
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Oracle Cloud Technology Enablers
9
Connect, Provision, Secure 1
4
Big Data Machine Learning
Correlate, Aggregate, Geo-Fence, Act 2 3
5 Use Insights, Apply models in real-time, Sharing common knowledge, Interact
Storage Cloud
CHATBOT MOBILE
Bettina 10:09
SOLE
ACQUA
LIVORE
Cleansing, Wrangling, Store
Discover, Understand, Training Models
fast
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Produce Tons of Data
Understand fail event
Machine working values for single threshold
Monitor goods location
Unlock the value of Digitalization through Stream & Advanced Analytics
With more Efficiency Analyzing merged data stream Discover Anomalies
With more Productivity Minimize lines down-time Predicting Events
With more Flexibility Tracking assets & processing Higher utilization & control
Industry 4.0 with IoT & Advanced Analytics
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
BigData
ERP MES
Grande Roberta
Real- Time
• Are there patterns of events that cause the equipment to fail?
• What are the top factors / influencers that affect product yield?
• What’s the downstream impact of yield change or defective parts?
• Is there a correlation between machine parameters and product quantity?
• Can we predict the likelihood of certain product defects?
• Are there assets used improperly or in caution areas
Industry 4.0 with IoT & Advanced Analytics
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Grande Roberta
Real- Time
• Is there a correlation between machine parameters and product quantity?
{ "PLANT": "A23", "MACHINE": 34, "PMIN": 110, "ODEV": 0.7, "TEMP": 38, "PRES": 3.4, "STATUS": "PAUSE", "LAT": 14.453, "LON": 42.673 }
120 msg/sec telemetry events X 6 machine sent 7 parameters, one of those is covariance params indicator
Real-Time clustering events through covariance and quantity discovering low-density cluster target Merge Real-Time & ERP Data
Industry 4.0 with IoT & Advanced Analytics
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Grande Roberta
Real- Time
• Is there a correlation between machine parameters and product quantity?
{ "PLANT": "A23", "MACHINE": 34, "PMIN": 110, "ODEV": 0.7, "TEMP": 38, "PRES": 3.4, "STATUS": "PAUSE", "LAT": 14.453, "LON": 42.673 }
120 msg/sec telemetry events X 6 machine sent 7 parameters, one of those is covariance params indicator
Real-Time clustering events through covariance and quantity discovering low-density cluster target Merge Real-Time & ERP Data
Industry 4.0 with IoT & Advanced Analytics
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 14
Paolo Piccioni
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Data Lab Where you can even find things you were not looking for
15
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 16 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Potential savings identified
Data Lab To Find Savings and Cost Reductions in Health Care Budget
• United Kingdom’s National Health Service
• Identify billing and identity fraud
• Optimize treatment by reducing use of less effective medical procedures
$156M
“With one vendor providing the whole solution, it’s very easy for us.” - Nina Monckton, NHS BSA
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Data Lab Fuel Enterprise Innovation
17
identify new customer segments
predict maintenance activities
detect fraudulent activity
better manage risk
create innovative products and services
optimize pricing
The data lab enables organizations to think and act like startups
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Data Lab Foundation
Data Lake (all data store)
Easier, Visual,
Faster, Powerful,
Self-Service
Data Visualization
Provides a broad range
of ML algorithms based
on open source, market
leading technologies Machine Learning
The Lab Core: explores
available data and their
relationships, transforms
data on-the-fly, discovers
hidden patterns
Data Discovery
Secures data, provides
Access Control, profiles
users according to their
roles Security
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19
Maintenance Insight
Starring Operations Analyst Data Scientist
Data Lab Demo Story
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Failure Likelihood
Assets on map
On this Dashboard an Operations Analyst can monitor assets on a map and keep “Failure Likelihood “ under control. Spot problems befere they happens means identify revenue at risk . How this can be done? Who provided this information and how?
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Search for data
The Analyst starts to talk with the data. He has access to the full catalog of Datasets available and can browse and search it as easily as shopping online. He starts searching for Sensor Data, finding a good starting point: a recent sensor readings dataset.
Operation Analyst
Start with the data. Here are the datasets
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Sensor reading: power
Sensor reading:
temperature
Sensor reading: pressure
1.2M recs dataset
In the dataset he sees all the attributes lined up graphically and notice in this 1.2M records dataset, that power, temperature and pressure can be zero, but this does not tell that’s an ACTUAL failure; it can be just a point in time
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Maintenance records file
Safety incident records file
Workorders file Sensors Readings
Using the sensor data set as starting point is good, but, to really figure out when failures happened , we’re going to use a whole bunch of other data sets. To investigate whether a past sensor reading was a failure for the data, we need to build and train a predictive model, looking for similar patterns in the data. That’s why we need to read and sift through maintenance records, safety incident records, and work orders datasets.
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Group continuous data into 4
buckets for easier searching and
filtering
Looking at the raw data sets, we realize that to improve them, we need to transform: for instance, we group the different amperage readings into buckets, because that’s a better way to filter and find bad equipment s – equipment that fail would show no, or really low, power. Now we can start build a Discovery Dashboard.....
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Need to filter down 39k notifications
Refine searching for new “power_off” bucket
…the enriched data is immediately available in my discovery dashboard and Because I have this new bucket for power ranges, I’m going to go from 39K notifications to a more reasonable (and readable) number
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
I’m down to 8 records.
Select the equipment with most power off
The first filter is on “power_off”. The second is to pick the equipment with the most “power_off” readings, which I did from the guided navigation. Now I can read the maintenance records and pick which were the real failures for this equipment , then go to the next and so on.
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Read the text of the notes
See the related incidents
And see the related incidents, read the text of the notes and logs, and use my experience in the field to determine whether something was really a failure, and finally flag the failures in the sensor data, thanks to transform capabilities. Then I share new, cleaned data, with my Data Scientist friend
Data Scientist
Operation Analyst
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Here’s where the data scientist jumps in with his particular set of tools. This is called a Jupyter notebook. So our Data Scientist, who is part of the team, writes the model in languages called Python or R, both popular programming languages for statistics. The point here is the ease of collaboration and sharing data between analysts and data scientists all working together.
Data Scientists
Of the 15 months of historical data, the data scientist used
12 months to “train” the model and then the last 3 months of
data to test it.
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
The non-data scientists will find it much simpler to understand the results of the model by looking at its output on a chart, and give confidence in the results. Historical data compared to what the model says would have happened. They match well and we can trust the model.
Operation Analyst
Data Scientist
Actual Equipment Failure Data
Predicted Equipment Failure Data
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Then the model is operationalized and your dashboards updated. The right people get alerts in real time and take action. Dashboards show predictions in reports, and data is there for operational discovery.
Operation Analyst
And he can continue working in the Data Lab on more and better models, predicting inventory, scheduling workers, weather impacts, and performance. It becomes essential to your work.
Data Scientist
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Discovery: A Modern Data Lab for Everyone
catalog
transform
discover
collaborate
predict
explore
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
BDA Cluster
BigData + Big Data Discovery = Enabled Data Lab
• BDD Data Processing - Sampling - Profiling - Auto-Enrichment - Cataloging - Data Set Transforms - Data Set Enrichments
• Big Data Discovery Web Studio • Big Data Discovery Dgraph
BD
D N
od
e(s
)
Had
oo
p D
ata
No
de
s
• Cloudera Enterprise • Oracle Big Data
Connectors (includes Oracle RAAH)
• Oracle Big Data Spatial and Graph
• Oracle Data Integrator Enterprise Edition
• Oracle Big Data SQL (add-on)
on YARN
Oracle Big Data Oracle Big Data Discovery
Had
oo
p N
od
es
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
From Data Lab to Process
34
Mainstream
Lab
Collect source
data and explore their contents
Select and
prepare data for exploitation
Experiment on data
through advanced analytics
Bring the value into production
Transform workplace
and workforce through
insights Consumers
Experts
Data Scientists
Experts
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Advanced Analytics Approach
35
“Data Driven Research” reasoning from the data to the
general theory
Machine Learning on the Process
...
Data Discovery in the Lab
Source data are initially explored to find out hidden relationships. This is the basis for picking up relevant
features to feed prediction models ( “features engineering”).
Induction
Data Scientists
Experts
Advanced Analytics in the Mainstream
The final step is to run ML models as well as new patterns in the mainstream, make their outcome
available for the broad users community through Data Visualization and Business Intelligence (i. e. historical
or Current Data to be “scored” for predictions).
Consumers
Machine Learning in the Lab
When the data context has been outlined and most relevant features identified, then ML models can be
built and evaluated over historical and new (lab) data. Data Scientists
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 36
Cristian Spigariol
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Types of uses of Machine Learning in all Industries Typical use case scenarios
Classification (predict among a set of options)
• Find and preventing customer churn
• Target the right customer with the right offer
• Predict customer response to an affinity card program
Regression (estimate a missing value)
• predict how much a customer will spend
Clustering (find unknown patterns)
• Detect anomalous or suspicious activities
Association Rules (find correlations)
• Predict correlation among items
Graph Analysis (understand interactions)
• Understanding influencers in social networks
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Machine Learning through Graphs
• Graphs systems focus on relationships rather than entities
– They are key to understand highly connected systems and relative behaviours (i. e. areas of strong/weak interaction) by examining how relationships spread throughout the graph
• Graphs algorithms are self-consistent
– The answer to complex problems resides in how entities (nodes) interact and not in the entities themself or in external resources
– Graph algorithms are effective even with graphs based on entities with few properties
• Cover a broad range of applications
– Their simple and flexible data model is able to describe a broad range of use cases, from financial systems, human neural networks to infrastructural networks (transportation, telco, electricity)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Marketing Analyses using Graphs
Graph algorithms can strongly improve the effectiveness of marketing analyses.
In customer profiling we can extend the individual profile of a given customer by considering his/her ability to influence the circle of friends
In marketing campaigns the identification of influencers can amplify the echo of the relative promotional activities and increase the conversion rate
In marketing campaigns the identification of strongly connected communities (people who interact on the basis of shared behaviors) can be the the basis for customers segmentation.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Network Analysis using Graphs
Detection of weak links is aimed at identifying nodes in the transportation/telco/energy network that have a high numbers of flows that come through them and that are not balanced with a proper number of alternative paths (Betweenness centrality).
Graphs algorithms are extremely useful to optimize network.
Network flows analysis consists in assigning to each connection (i. e. link between two nodes) a capacity and evaluate the total amount of flows that passes on it. The amount of flows on an edge cannot exceed the capacity of the edge.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidentihal – Internal 41
Collaborative Filtering Pattern
Find out similarities Select
potential targets
Rank output by relevance
If a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person (Wikipedia).
Find out people that present the same behavior with respect to the person A.
Person A
Select the items chosen by the people similar to the person A (i. e. potential targets).
Among the potential targets weight the items that present the highest relevance rank.
Higher Rank
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Ricky
Simon
Lucia
Circle of trust
Ricky 0.4
Simon 0.3
Lucia 0.1
John Maria
...
...
...
By using a centrality algorithm (Personalized PageRank) we can determine the most influent people in the circle of connections originated by Alice.
We move from similarities to trust! Alice
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3
Ricky
Simon
Lucia
Circle of trust Targets
Prod#4
Prod#9
Prod#7
We determine the potential targets by selecting the products already boughts by the trusted people (bipartite graph).
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3 (2)
Ricky
Simon
Lucia
Circle of trust Targets
Prod#4 (1)
Prod#9 (1)
Prod#7 (2)
We start the relevance algorithm (salsa) by measuring the relevance score, that is the sum of the preferences received by each product.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Ricky (2)
Simon (4)
Lucia (4)
Circle of trust Targets
We then walk connections back-to-front to measure Hub Score as the sum of the relevance ranks of products bought
Measure the ability of each trusted person to intercept the tastes of the circle
Prod#3 (2)
Prod#4 (1)
Prod#9 (1)
Prod#7 (2)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3 (6)
Circle of trust Targets
Prod#4 (4)
Prod#7 (8)
Prod#9 (4)
The new relevance score is measured with the weighted
sum of the preferences (hub ranks) received.
Ricky (2)
Simon (4)
Lucia (4)
The Prod#7 has the highest likelihood to be well-accepted by
Alice since it has been chosen by the most
“knowledgeable” trusted people
Alice
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
The algorithm can be iterated many times. Each iteration will
reinforce the rank score and the relevance score.
The higher the number of iterations the higher the effectiveness of the algorithm
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graphs Analytical Algorithms – Possible use cases
This recommendation approach is at the basis of the WTF service at Twitter. It can be proficiently be applied to different industries as for example:
to recommend insurance policies based on the most relevant opinions of “trusted“ people
to up-sell telco services with the same trust+expertise approach.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Do it yourself? If so please consider...
Complexity Productivity
Architecture Integration
Not trivial algorithms, need domain specific knowledge
Bug Fixing, Tuning for precision and performance, Support
Graph algorithms need in-memory parallel execution as well as a low-
latency NoSQL storage
You need to integrate your solution with the Big Data cluster to feed
your Graph database
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
A rich set of built-in, parallel algorithms Parallel graph mutation operations
Detecting Components and Communities
Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Spasification
Ranking and Walking Pagerank, Personalized
Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants)
Evaluating Community Structures
∑ ∑
Conductance, Modularity Clustering Coefficient (Triangle Counting)
Path-Finding
Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s
Link Prediction SALSA
(Twitter’s Who-to-follow)
Other Classics Vertex Cover
a
d
b e
g
c i
f
h
The original graph a
d
b e
g
c i
f
h
Undirected Graph
Simplify Graph
a
d
b e
g
c i
f
h
Left Set: “a,b,e”
a d
b
e
g
c
i
Bipartite
Graph
g e b d i a f c h
Sort-By-Degree (Renumbering)
Filtered Subgraph
d
b
g
i
e
54
Oracle Big Data Spatial and Graph – Memory Analyst
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Machine Learning is Data Driven
• Data is the fuel of ML algorithms
• The effectiveness of ML algorithms is strictly tied to the amount of available
data
• To translate ML results into a competitive advantage we need a paradigm shift in the way information management solution are designed and managed.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal 56
A Data Driven Strategy with Machine Learning
Adopt Standards
Don’t Move data
Broadcast ML results
Experiment and Act
New Paradigm
• Data is heavy – don’t move data
• Move elaboration to data instead
• Reduce the complexity
• Facilitate integration
• Speak the language of Data Scientists (R, Python, Scala, Spark, Gremlin)
• Take advantage of new ML packages release (e. g. CRAN, MLlib)
• Define your models (Lab) and then move them in the mainstream (Prod)
• Score your models continuoulsly (both in batch and in streming)
• Take them up to date
• Spread ML results thorughout the user communities
• Predictions are new inputs for in-place processes or analyses (additional KPIs, properties, etc..)
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | 57
Roberto Falcinelli
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
To us, data is an asset and an heritage:
We just need to find the right way to "look inside",
to see what you normally do not see.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Driven Investigation: a powerful story
Oracle & Trenitalia Confidential 59
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 60
Data Driven Investigation: a powerful story
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Understanding Data visualization
• It's the study of how to represent data by using a visual approach rather than the traditional reporting method • It is a visual way of telling a “story”
(*) Antoine De Saint Exupery - Le Petit Prince – Chapitre 1
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Difference between Data Viz and Infographics
• Infographics: • is usually static • is artful • is less data, more conclusions
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Difference between Data Viz and Infographics • Data Viz:
• gives user the right info • is fully Interactive and is scalable • is visually appealing • it works on any device • advanced analytics at your fingertips
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Telling a Data Driven Analytics Story
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
«conversare» con i dati:
65
Fenomeno = Spese in UK fuori controllo
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 66
Due Problemi: 1. Spese di viaggio con un picco preoccupante 2. Spese per Stipendi in crescita costante
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 67
Spese di Viaggio: Troppe spese per Hotel Out of Policy in Luglio / Agosto
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 68
Stipendi: Aumentano le spese per straordinari vs stipendio base >> ma perche??
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 69
Alto turnover al Call Center nel mese di Luglio ... per motivi soprattutto economici
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle & Trenitalia Confidential 70
Fra i dipendenti dimissionari, si possono individuare dei cluster e delle correlazioni fra centri di costo e le ragioni per le dimissioni
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Visualization Capabilities
Advanced Analytics
Any device Ask & Search
Confidential – Oracle Internal/Restricted/Highly Restricted 71
Data MashUp & Discovery
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Introducing
Day by Day
A new app from the BI Mobile labs that will learn what users are
interested in, when & where they are interested in it and who
they collaborate with
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Introducing
Synopsis Visual, interactive and Intuitive
Works in-line with the apps you know and love
Start analyzing directly from email and don’t just
look at your data… Understands it
Go to Apple Store OR Play Store and look for: «Oracle Synopsis»
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
DATA SEE & DETECT
CONNECT & PREPARE MODEL &
BUILD
DEPLOY & SCALE
LEARN & SHARE
75
Every one can contribute to:
Find Hidden Patterns
Build Collective Intelligence
Liberate All Data
Create Agile Enterprise
Adapt to Your Needs
• Can you do it just using Excel ?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
ANKI Overdrive Oracle Cloud Demo Oracle Cloud Integrated Applications and Platform Services showcased in a real racing car Demo
Road to Big Data From Analytics Big Bang to Cloud Revolution
#RoadToBigData
Recommended