Getting involved in deep learning with DL4J...Getting involved in deep learning with DL4J...

Getting involved in deep learning with DL4J@EclipseCon Europe 2019 – 22th October 2019 – LudwigsburgRCP Vision

Vincenzo Caselli vincenzo.caselli@rcp-vision.comFrancesco Guidieri francesco.guidieri@rcp-vision.comLuca Paoli luca.paoli@rcp-vision.com

AI, Machine & Deep Learning

ArtificialIntelligence

MachineLearning

DeepLearning

A paradigm shift Algorithmic approach Deep Learning

Goal: make the input-output association “emerge” as a trained neural network(generalize from data)

In essence a Neural Network is an association machine

Goal: apply the algorithm rules to the input

( )Lots of

• First steps in the 1940s - McCulloch and Pittshttp://wwwold.ece.utep.edu/research/webfuzzy/docs/kk-thesis/kk-thesis-html/node12.html

“The early model of an artificial neuron is introduced by Warren McCulloch and Walter Pitts in 1943.”

• 1958 - Frank Rosenblatt - The Perceptron

• Youtube “History of Neural Networks” https://www.youtube.com/watch?v=6fXNiJXUheI

Deep Learning Origins

Inputs, Weights & ActivationHow a node state depends from its inputs and their weights

Input#1: Active

Input#2: Active

Input#3: Inactive

excitatory connection

inhibitory connection Σ

Activation Function

Organizing nodes in layers ...

Multi Layer Perceptron

Typical deep learning tasksMNIST

Face recognition

Image is person A?

... true | false

Face classification

Image who is?

... A | B | C

Fraud detection

Data Fraud?

... true | false Automated Driving Systems

Road Images &

sensor inputsbb

Steering wheel angle?

... 0°-10°+23°

Regression tasksClassification tasks

Rain mm forecastsensors rain mm

... ...

A very minimal network & its training

I1 I2 O0.12 0.87 0.700.45 0.72 0.95

... ... ...

Training Set Network to be trained

Global error for a given weights set

Target = desired output

Actual = current output

• weights (w1,w2) randomly initialized• item in the Training Set

calc the Actual output calc the distance from the Target output (=error)

• calc the global error at (w1,w2): e.g. Σ ei2 [global=for the entire Training Set]

Error as a function in the weights domain

Error = Cost = Loss Function

error = f (w1,w2, ... )

The goal: finding the minimum cost

http://al-roomi.org/3DPlot/index.html ((x^3+4*y^2)*sqrt(abs(sin(x^2+y^2))))/100

The goal: finding the minimum cost

Ideal Goal Practical Goal

• find the global minimum• if it were a mathematical

function we could use the derivative

• the domain is a continuous

• find a good minimum• we can just move around

the weights domain and do samplings for the error

• steps are discrete (may be too small or too large)

Adjusting weights for the path to a good minimum

Error (Cost)

where to move from here ??

... and how much to move ??LR = Learning Rate

Gradient DescentThe Gradient Descent algorithm allows finding local minima:

1. from a given (random) initial point

2. find the direction with the steepest gradient descent

3. make a little step (w) in this direction (Note: it's a N-dimensional space)

4. go to step 2( like a skier in the fog ... )

Backpropagation algorithmBasics dates back in 1960 - Henry J. Kelley / 1961 - Arthur E. Bryson

..."Learning representations by back-propagating errors"

Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986 - Nature. 323)

https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fdhttps://www.linkedin.com/pulse/gradient-descent-backpropagation-ken-chen

https://www.nature.com/articles/323533a0

Learning Rate (LR) modulates w

Neural Networks numbersA simple neural network for classifying images

• input: images 400x300 pixels =>120.000 input nodes

• hidden layer#1: 50.000 nodes=> 120.000 x 50.000

=> 6.000.000.000 connections (weights) just between input and first hidden layer !

Neural Networks ages• bright & dark ages in the Neural Network history

(several “winter” seasons)

• training requires: lots of data great elaboration effort huge amount of time

• why so much success now?• availability of a great amout of data (Big Data)• high elaboration power in modern pc's

Why “Deep” learning ?• each hidden layer can learn different features

input h1 h2 ... hn output

+abstraction

Why “Deep” learning ?• e.g. in image classification tasks

imagelines,

corners,...

shapes ... complex figures output

+abstraction

Many degrees of freedom High number of parameters and topology choices

• how many hidden layers ?

• which Activation Function ?

• Learning Rate value ?

• topology: • Standard Multilayer Neural Network • Convolutional Neural Network (CNN)• Recurrent Neural Network (RNN)• other ...

Some priorityInput & Output layer topology & mapping have to be designed first!

MNISTThe MNIST dataset can be considered: • a sort of “Hello World” in the Deep Learning domain• a reference example against which to test learning paradigms or techniques• 70,000 28×28 pixels images of handwritten 0-9 digits

MNIST: slicing an image for input28

784 input 0|1 nodes

MNIST: output layer28

784 input nodes

0001000000

10 output nodes(only one on=1)

0123456789

MNIST, Training & Test Set

• 60,000 in the Training Set (used to train the network)• 10,000 in the Test Set

(used to measure how the network is really learning)This set, which is excluded from the training, plays the role of a “third party evaluator”.

784 input nodes 10 output nodes

MNIST Dataset setup• download & unzip

http://github.com/myleott/mnist_png/raw/master/mnist_png.tar.gz

Deeplearning4j website

Eclipse New Project• New Maven Project

DL4J Maven dependencies<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>it.rcpvision.dl4j</groupId> <artifactId>it.rcpvision.dl4j.workbench</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies>

<dependency><groupId>org.nd4j</groupId><artifactId>nd4j-native-platform</artifactId><version>1.0.0-beta5</version>

</dependency><dependency>

<groupId>org.deeplearning4j</groupId><artifactId>deeplearning4j-core</artifactId><version>1.0.0-beta5</version>

</dependency><dependency>

</dependency></dependencies>

</project>

Constants & helper

//The absolute path of the folder containing MNIST training and testing subfoldersprivate static final String MNIST_DATASET_ROOT_FOLDER = "/home/vincenzo/dl4j/mnist_png/";

//Height and widht in pixel of each imageprivate static final int HEIGHT = 28;private static final int WIDTH = 28;

//The total number of images into the training and testing setprivate static final int N_SAMPLES_TRAINING = 60000;private static final int N_SAMPLES_TESTING = 10000;

//The number of possible outcomes of the network for each input, //correspondent to the 0..9 digit classificationprivate static final int N_OUTCOMES = 10;

private static DataSetIterator getDataSetIterator(String folderPath, int nSamples) throws IOException {

DataSetIteratorprivate static DataSetIterator getDataSetIterator(String folderPath, int nSamples) throws IOException {

File folder = new File(folderPath);File[] digitFolders = folder.listFiles();

NativeImageLoader nil = new NativeImageLoader(HEIGHT, WIDTH);ImagePreProcessingScaler scaler = new ImagePreProcessingScaler(0, 1);

INDArray input = Nd4j.create(new int[] { nSamples, HEIGHT * WIDTH });INDArray output = Nd4j.create(new int[] { nSamples, N_OUTCOMES });

int n = 0;for (File digitFolder : digitFolders) {

int labelDigit = Integer.parseInt(digitFolder.getName());File[] imageFiles = digitFolder.listFiles();for (File imageFile : imageFiles) {

INDArray img = nil.asRowVector(imageFile);scaler.transform(img);input.putRow(n, img);output.put(n, labelDigit, 1.0);n++;

}}DataSet dataSet = new DataSet(input, output);List<DataSet> listDataSet = dataSet.asList();Collections.shuffle(listDataSet, new Random(System.currentTimeMillis()));DataSetIterator dsi = new ListDataSetIterator<DataSet>(listDataSet, 10); // batchSize=10return dsi;

Input & Output initialization,setting& packaging into aDataSet and a DataSetIterator

Building the MLP Network//org.deeplearning4j.nn.conf.layers.DenseLayer//org.deeplearning4j.nn.weights.WeightInit//org.deeplearning4j.nn.conf.layers.OutputLayer//org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;

public static void main(String[] args) throws IOException {DataSetIterator dsi = getDataSetIterator(MNIST_DATASET_ROOT_FOLDER + "training", N_SAMPLES_TRAINING);

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(123) //include a random seed for reproducibility // use stochastic gradient descent as an optimization algorithm .updater(new Nesterovs(0.006, 0.9)) .l2(1e-4) .list() .layer(new DenseLayer.Builder() //create the first, input layer with xavier initialization .nIn(HEIGHT*WIDTH) .nOut(1000) .activation(Activation.RELU) .weightInit(WeightInit.XAVIER) .build()) .layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) //create hidden layer .nIn(1000) .nOut(N_OUTCOMES) .activation(Activation.SOFTMAX) .weightInit(WeightInit.XAVIER) .build()) .build();

Layer#1

Layer#2

A random initialization method

Typical for a 1 on N output

Learning Rate

Momentum

Layer#1

Training & testing the NetworkMultiLayerNetwork model = new MultiLayerNetwork(conf);model.init();//print the score with every 500 iterationmodel.setListeners(new ScoreIterationListener(500));model.fit(dsi);

DataSetIterator testDsi = getDataSetIterator( MNIST_DATASET_ROOT_FOLDER + "testing", N_SAMPLES_TESTING );Evaluation eval = model.evaluate(testDsi); //org.nd4j.evaluation.classification.Evaluation;

System.out.println(eval); =======================Evaluation Metrics======================== # of classes: 10 Accuracy: 0.9689 Precision: 0.9686 Recall: 0.9688 F1 Score: 0.9687Precision, recall & F1: macro-averaged (equally weighted avg. of 10 classes)

=========================Confusion Matrix========================= 0 1 2 3 4 5 6 7 8 9--------------------------------------------------- 969 0 1 1 0 2 3 1 3 0 | 0 = 0 0 1117 3 2 0 1 5 0 7 0 | 1 = 1 6 1 1002 1 3 0 3 8 8 0 | 2 = 2 0 0 6 976 0 12 0 5 8 3 | 3 = 3 1 0 5 1 940 0 13 2 2 18 | 4 = 4 4 1 0 4 2 865 9 0 5 2 | 5 = 5 9 2 1 0 3 8 931 1 3 0 | 6 = 6 1 7 12 2 3 1 0 989 3 10 | 7 = 7 4 1 3 5 3 4 6 4 942 2 | 8 = 8 6 7 1 7 11 3 2 5 9 958 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times

==================================================================

TrainingTesting

Convolutional Neural NetworksLayers are not fully connected:

• A node receives input only from a group of adjacent nodes in previous layer

• The same set of weights (“filter”) is applied to all nodes between twoconsecutive layers

filter

Convolutional Neural Networks• specific for images

E.g. 3x3 (or 5x5) filtersliding over the image,from top-left to bottom-right

Convolutional Neural Networks• more filters

• each one detectsdifferent feature details

• after each Convolutional layer we put a Pooling (subsampling) layer in order to reduce subsequent dimensions

Building the Convolutional Network//org.nd4j.linalg.lossfunctions.LossFunctions;//org.deeplearning4j.nn.conf.layers.ConvolutionLayer;//org.deeplearning4j.nn.conf.layers.SubsamplingLayer;int channels = 1;MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()

.seed(123) .l2(0.0005) // ridge regression value .updater(new Nesterovs(0.006, 0.9)) .weightInit(WeightInit.XAVIER) .list() .layer(new ConvolutionLayer.Builder(5, 5) .nIn(channels ) .stride(1, 1) .nOut(20) .activation(Activation.IDENTITY) .build()) .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX) .kernelSize(2, 2) .stride(2, 2) .build()) .layer(new ConvolutionLayer.Builder(5, 5) .stride(1, 1) // nIn need not specified in later layers .nOut(50) .activation(Activation.IDENTITY) .build()) .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX) .kernelSize(2, 2) .stride(2, 2) .build())

Layer#1 - Convolutional20 5x5 filters

Layer#2 - Pooling 2x2

Layer#3 - Convolutional50 5x5 filters

Layer#4 - Pooling 2x2

Building the Convolutional Network .layer(new DenseLayer.Builder().activation(Activation.RELU) .nOut(500) .build()) .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .nOut(N_OUTCOMES) .activation(Activation.SOFTMAX) .build()) .setInputType(InputType.convolutionalFlat(HEIGHT, WIDTH, channels)) // InputType.convolutional for normal image .build();

========================Evaluation Metrics======================== # of classes: 10 Accuracy: 0.9839 Precision: 0.9840 Recall: 0.9836 F1 Score: 0.9838Precision, recall & F1: macro-averaged (equally weighted avg. of 10 classes)

=========================Confusion Matrix========================= 0 1 2 3 4 5 6 7 8 9--------------------------------------------------- 976 0 0 0 0 0 0 1 2 1 | 0 = 0 0 1123 3 2 0 2 3 2 0 0 | 1 = 1 2 1 1014 2 0 0 0 12 1 0 | 2 = 2 0 0 1 998 1 2 0 5 1 2 | 3 = 3 0 0 3 0 964 0 6 2 0 7 | 4 = 4 2 0 0 10 1 863 5 1 2 8 | 5 = 5 6 2 1 0 1 2 945 0 1 0 | 6 = 6 1 1 5 1 0 0 0 1014 1 5 | 7 = 7 4 0 2 2 4 0 2 5 948 7 | 8 = 8 1 0 0 1 8 1 0 4 0 994 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times==================================================================

Layer#5 - Fully connected layer with 500 nodesLayer#6 - Output

A real example taken from Kaggle

Kaggle example with a CNNint channels = 3;MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()

.seed(123) .l2(0.0005) // ridge regression value .updater(new Nesterovs(0.005, 0.9)) .weightInit(WeightInit.XAVIER) .list() .layer(new ConvolutionLayer.Builder(5, 5) .nIn(channels ) .stride(1, 1) .nOut(40) .activation(Activation.RELU) .build()) .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX) .kernelSize(5, 5) .stride(2, 2) .build()) .layer(new ConvolutionLayer.Builder(5, 5) .stride(1, 1) // nIn need not specified in later layers .nOut(50) .activation(Activation.RELU) .build()) .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX) .kernelSize(5, 5) .stride(2, 2) .build()) ========================Evaluation Metrics========================

# of classes: 2 Accuracy: 0.9457

Resources

Code in this talkhttps://github.com/vincenzocaselli/dl4j-ece2019

Deeplearning4j Websitedeeplearning4j.org

Deeplearning4j GitHubhttps://github.com/eclipse/deeplearning4j

Deep Learning - A Practitioner's ApproachBy Adam Gibson, Josh Patterson

Getting involved in deep learning with DL4J...Getting involved in deep learning with DL4J...

Documents

RECAP Game Design Class Héctor Muñoz-Avila. Motivation Compelling games don’t need –the latest and best graphics –deep narrative or involved story line

Comments on Streaming Deep Learning with Flink · •Actually, DL4J does this, then averages the model parameters. Model Training vs. Model Serving (Inference) Kinds of Training

IBM Deep Learning Solutions - OpenPOWER Foundation · IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and ... DL4J 0.5.0(*) Chainer GPU 2x K80

Enterprise Deep Learning with DL4J

DL4J at Workday Meetup

Suneel Marthi - Deep Learning with Apache Flink and DL4J

Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

NASA Astrobiology Institute 2016 Annual Science Report€¦ · The earliest history of animals: ... deep microbial lineages directly involved with Earth’s geochemical cycles, including

Dl4j in the wild

Frameworks in Python for Numeric Computation / MLdeeplearning.cs.cmu.edu/recitations/rec2.basics.pdf · Caffe, Theano, Keras, DL4J etc

Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Enterprise Deep Learning Workflows With DL4J Josh Patterson Strata Singapore 2015

Built for Big Data: Accelerated Deep Learning with POWER · Built for Big Data: Accelerated Deep Learning with POWER Ruchir Puri ... DL4J (Spark) Library layer Frameworks DLSL ©

dptportfolios.web.unc.edudptportfolios.web.unc.edu/files/2019/05/Lane_Advanced... · Web viewlies just deep to the external oblique.6 The external and internal obliques are involved

When you need to add Deep Learning to raise your next ... · When you need to add Deep Learning to raise your next round of funding @holdenkarau Photo by: brando. Holden: ... DL4J

Deep Learning in Action - recurrent null · Why features matter 4 2017/15/09 Deep Learning in action - with DL4J Source: Goodfellow et al., Deep Learning, 2016

Basal Ganglia. Involved in the control of movement Dysfunction associated with Parkinson’s and Huntington’s disease Site of surgical procedures -- Deep

SKIL - Dl4j in the wild meetup

Practical Applications of Deep Reinforcement Learning ... · DL4J and RL4J libraries. SKIL & AnyLogic. Summer 2019. AnyLogic Cloud Python API (RL ready) June 2019. RL capabilities

Geochemical tracing and modeling of surface and deep water ...english.gyig.cas.cn/pu/papers_CJG/201711/P0201711065243006601… · of weathering processes involved and their relationships