36
LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZURE Dr. Miguel Fierro Data Scientist at Microsoft @miguelgfierro [email protected] https://miguelgfierro.com Plymouth University | Jan 27, 2017 | Plymouth, UK

Leveraging Data Driven Research Through Microsoft Azure

Embed Size (px)

Citation preview

Page 1: Leveraging Data Driven Research Through Microsoft Azure

LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZUREDr. Miguel Fierro

Data Scientist at Microsoft

@[email protected]://miguelgfierro.com

Plymouth University | Jan 27, 2017 | Plymouth, UK

Page 2: Leveraging Data Driven Research Through Microsoft Azure

AZURE FOR RESEARCH AWARD

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

[email protected]

Free Azure resources if awarded

Areas: data science, climate, health…

Ex: Alan Turing Institute got $5M

Page 3: Leveraging Data Driven Research Through Microsoft Azure

D a t a S c i e n c e V i r t u a l

M a c h i n eA z u re M L S t u d i o

S p a r k a n d H a d o o p

w i t h A z u re

OUTLINE

Page 4: Leveraging Data Driven Research Through Microsoft Azure

SPARK & HADOOP WITH AZURE

Page 5: Leveraging Data Driven Research Through Microsoft Azure

WHAT IS HDINSIGHT

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

HDInsightManaged Service

Page 6: Leveraging Data Driven Research Through Microsoft Azure

MANAGER GUI: AMBARI

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 7: Leveraging Data Driven Research Through Microsoft Azure

APACHE HADOOP

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Software for storing and analysing

massive amounts (~Tb) of

structured and unstructured data

Page 8: Leveraging Data Driven Research Through Microsoft Azure

APACHE SPARK

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Framework that runs large-scale data analytics applications

pySpark, Spark (Scala), SparkR

100x faster than Hadoop (processing in memory)

Page 9: Leveraging Data Driven Research Through Microsoft Azure

APACHE KAFKA

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Stream processing for real time apps

Publisher & subscriber messaging system

Millions of messages per second

Page 10: Leveraging Data Driven Research Through Microsoft Azure

APACHE STORM

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Distributed framework for real-time applications

ETL, continuous computation, online machine learning

Million of operations per second in each node

Page 11: Leveraging Data Driven Research Through Microsoft Azure

APACHE HBASE

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Non-relational database (NoSQL) for Big Data applications

Distributed, fast tolerant and scalable

Built on top of HDFS (Hadoop Distributed File System)

Page 12: Leveraging Data Driven Research Through Microsoft Azure

APACHE HIVE

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

SQL-like language to query data in Hadoop systems

Word count program

Page 13: Leveraging Data Driven Research Through Microsoft Azure

EXAMPLE OF ARCHITECTURE

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 14: Leveraging Data Driven Research Through Microsoft Azure

DEMO: PYSPARK APPLICATION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Log analysis with PySpark Predictive analysis on food inspection with PySpark

source: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-machine-learning-mllib-ipython

source: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-custom-library-website-log-analysis

Page 15: Leveraging Data Driven Research Through Microsoft Azure

AZURE ML STUDIO

Page 16: Leveraging Data Driven Research Through Microsoft Azure

WHAT IS AZURE ML STUDIO

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

GUI for Machine Learning

Page 17: Leveraging Data Driven Research Through Microsoft Azure

DATA INPUT/OUTPUT

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 18: Leveraging Data Driven Research Through Microsoft Azure

DATA TRANSFORMATION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 19: Leveraging Data Driven Research Through Microsoft Azure

DATA MANIPULATION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 20: Leveraging Data Driven Research Through Microsoft Azure

FEATURE SELECTION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 21: Leveraging Data Driven Research Through Microsoft Azure

CLASSIFICATION & REGRESSION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 22: Leveraging Data Driven Research Through Microsoft Azure

TRAINING & SCORING

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 23: Leveraging Data Driven Research Through Microsoft Azure

PYTHON & R SCRIPTS

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 24: Leveraging Data Driven Research Through Microsoft Azure

AUTOMATIC API

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 25: Leveraging Data Driven Research Through Microsoft Azure

DEMO: CREDIT RISK ANOMALY DETECTION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

source: https://gallery.cortanaintelligence.com/Experiment/1219e87f8fb84e88a2e1b54256808bb3

Page 26: Leveraging Data Driven Research Through Microsoft Azure

DATA SCIENCE VIRTUAL MACHINE

Page 27: Leveraging Data Driven Research Through Microsoft Azure

WHAT IS THE DSVM

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Windows:- Anaconda with python Jupyter notebooks- Microsoft R Server- Visual Studio- SQL Server- Azure SDK- Deep learning: CNTK & MXNet- Machine Learning: XGBoost

Linux:- Anaconda with python Jupyter notebooks- Microsoft R Server- PyCharm- Azure SDK- Deep learning: CNTK & MXNet- Machine Learning: XGBoost, Weka

Page 28: Leveraging Data Driven Research Through Microsoft Azure

DEEP LEARNING DSVM

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Libs:- CNTK- MXNet- TensorFlow- Keras

Digit recognition Image recognitionExamples:

Page 29: Leveraging Data Driven Research Through Microsoft Azure

NVIDIA TESLA K80

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 30: Leveraging Data Driven Research Through Microsoft Azure

AI LANDSCAPE: IMAGES

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

15.4%

7.3%

6.7%

3.6%3.1%

5.1% (human)

error (%)

ImageNet (image recognition competition) top-5 error

AlexNet(2012)

VGG(2014)

Inception(2015)

ResNet(2015)

Inception-ResNet(2016)

Page 31: Leveraging Data Driven Research Through Microsoft Azure

AI LANDSCAPE: SPEECH

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Microsoft Research achieves parity with human speech level

source: http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition

CNN(VGG, ResNet, LACE)

RNN(Bi-LSTM)

Multi-GPU and multi server(1-bit Stochastic Gradient Descent)

Page 32: Leveraging Data Driven Research Through Microsoft Azure

IMAGE CLASSIFICATION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

1.

2.

3.

4.

5.

source: https://blogs.technet.microsoft.com/machinelearning/2016/11/15/imagenet-deep-neural-network-training-using-microsoft-r-server-and-azure-gpu-vms/

Page 33: Leveraging Data Driven Research Through Microsoft Azure

IMAGE CLASSIFICATION IMAGENET

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

source: https://blogs.technet.microsoft.com/machinelearning/2016/11/15/imagenet-deep-neural-network-training-using-microsoft-r-server-and-azure-gpu-vms/

Real class

Predicted class

Page 34: Leveraging Data Driven Research Through Microsoft Azure

TEXT CLASSIFICATION

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Train

Backend

Dataset

Azure NC24 VM with 4 K80 GPUs

.R

model.params

Azure Cloud Services

.py

.js

.html

Score

Web app

API

DNN

input text

Page 35: Leveraging Data Driven Research Through Microsoft Azure

DEMO: TEXT CLASSIFICATION WEB APP

Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro

Page 36: Leveraging Data Driven Research Through Microsoft Azure

LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZUREDr. Miguel Fierro

Data Scientist at Microsoft

@[email protected]://miguelgfierro.com

Plymouth University | Jan 27, 2017 | Plymouth, UK