View
98
Download
2
Category
Tags:
Preview:
Citation preview
2/5/2015
1
The many
sources and
rapid growth of
data requires a
new approach
2/5/2015
2
106
Megabyte
109
Gigabyte
500TB per Day in
1012
Terabyte
The CERN Large Hadron
Colider generates 1PB per
Second
1015
Petabyte
1EB of data is created on the
Internet every day = 250
Million DVDs
1018
Exabyte
1.3ZB of network
traffic by 2016
1021
Zettabyte
This is our unniverse today =
250 Trillion of DVDs
1024
Yottabyte
This will be our digital
universe tomorrow with
data from the IoT
1027
Brontobyte
Discover, explore, and combine any data
Right from Excel, find any data: corporate, social, machine, Hadoop, open
Easily merge, transform, and clean up data
2/5/2015
3
Explore & Visualize
Predictive Analytics
Forecasting/extrapolation What if these trends continue?
Predictive Modeling What will happen next?
Optimization What’s the best that can happen?
Co
mp
eti
tive
Ad
van
tag
e
Value
IntelligenceDiscoveryPresentation
Inte
ract
ive
Pro
act
ive
Pass
ive
Exploration
2/5/2015
4
What is Machine Learning?
Predictive computing
systems become smarter
with experience
We want to learn a mapping from the input to the output; correct
values are provided by supervisor:
• Fraud Detection
• Image Recognition
• SPAM Filter
• Sales Forecast
We want to find regularities in the data. The class labels of training
data is unknown.
• Customer Segmentation
• Movies Recommendation engine
Several scenarios across diverse industries
Churn
analysis
Predictive
Maintenance
Spam
filtering
Ad
targeting
Recommendations
Engines
Fraud
detection
Image
detection &
classification
Forecasting
Anomaly
detection
2/5/2015
5
Harvard Business, Thomas H. Davenport , October 2012
Business Problem Business ValueModeling Deployment
Azure Machine Learning
Devices & Applications
ML Studio
Azure Machine Learning
Azure Machine
Learning
APIPublish API
DATA
HDInsight
SQL Server VM
SQL Database
Blobs & Tables
Desktop files
Excel spreadsheet
Other data files on PC
2/5/2015
6
R Open
Source
Packages
Mathematical
Programming
Online
analytical
processing
Graph
analytics
Text
analytics
Support
Vector
Machines
Boosted
Decision
Trees
Time series
processing
In the future
Support for
extensibility
by enabling
users to add
their own
algorithms as
modules
Associative
rule mining
Neural
networks
Regression
analysisClustering
Nearest-
neighbor
Algorithms
The Microsoft Cybercrime Center
Fraud Detection
2/5/2015
7
The impact of Cybercrime
Cybercrime costs consumers $113 billion
a year
Every second, 12 people are victims of
cybercrime, nearly 400 million every year
50% of online adults have been victims in
the past year
1 in 5 small and medium enterprises are
targeted by cyber criminals
113 B
400 M
50 %
1 in 5
DEMO
2/5/2015
8
Recommended