26
1 Artificial Intelligence and Machine Learning Vijay Gadepally Jeremy Kepner, Lauren Milechin, Siddharth Samsi DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering. © 2020 Massachusetts Institute of Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work. Slide contributions from: Siddharth Samsi, Albert Reuther, Jeremy Kepner, David Martinez, Lauren Milechin AI and ML - 2 VNG 010720 Outline Artificial Intelligence Overview Machine Learning Deep Dives Supervised Learning Unsupervised Learning Reinforcement Learning Conclusions/Summary

Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

1

Artificial Intelligence and Machine Learning

Vijay GadepallyJeremy Kepner, Lauren Milechin, Siddharth Samsi

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

© 2020 Massachusetts Institute of Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

Slide contributions from: Siddharth Samsi, Albert Reuther, Jeremy Kepner, David Martinez, Lauren Milechin

AI and ML - 2VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

Page 2: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

2

AI and ML - 3VNG 010720

What is Artificial Intelligence?

Narrow AI: The theory and development of computer

systems that perform tasks that augment for human

intelligence such as perceiving, classifying, learning,

abstracting, reasoning, and/or acting

General AI: Full autonomy

Definition adapted from Oxford dictionary and inputs from Prof. Patrick Winston (MIT)

AI and ML - 4VNG 010720

AI. Why Now?

Convergence of High Performance Computing, Big Data and Algorithms that enable widespread AI development

Big Data Machine Learning AlgorithmsCompute Power

Source: DARPA/ Public domain Source: DARPA/ Public domain

Page 3: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

3

AI and ML - 5VNG 010720

AI Canonical Architecture

Data Conditioning

StructuredData

UnstructuredData

Sensors

Sources

Information Knowledge Insight

Users (Missions)

Robust AI

Explainable AI Metrics and Bias Assessment Verification & Validation Security

(e.g., counter AI)Policy, Ethics, Safety and

Training

Algorithmse.g.:

Human- Machine Teaming (CoA)

Spectrum

• Knowledge-Based• Unsupervised

and Supervised Learning

• Transfer learning

• Reinforcement Learning

• etc

HumanHuman-Machine ComplementMachine

Modern Computing

CPUs GPUs QuantumCustom . . . TPU Neuromorphic

GPU = Graphics Processing Unit TPU = Tensor Processing Unit

CoA = Courses of Action

© IEEE. Figure 1 in Reuther, A., et al, "Survey and Benchmarking of Machine Learning Accelerators," 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2019, pp. 1-9, doi: 10.1109/HPEC.2019.8916327. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

AI and ML - 8VNG 010720

Select History of Artificial Intelligence

2016 - DeepMind AlphaGo defeats top human Go player (Lee Sedol)

2015 - DeepMind achieves human expert level of play on Atari games (using only raw pixels and scores)

Adapted from: The Quest for Artificial Intelligence, Nils J. Nilsson, 2010 and MIT Lincoln Laboratory Library and Archives

1997 - IBM Deep Blue defeats reigning chess champion (Garry Kasparov)

2007 - DARPA Grand Challenge (“Urban Challenge”)

2011 - IBM Watson defeats former Jeopardy! champions (Brad Rutter and Ken Jennings)

1955 - Western Joint Computer Conference Session on Learning Machines

1956 - Dartmouth Summer Research Project on AI

1958 - National Physical Laboratory in the UKSymposium on the Mechani-zation of Thought Processes

1986–presentThe return of neural networks

1950 - Computing Machinery and Intelligence “Turing Test” published by MIND vol. LIX

2012 - Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning software

G. DineenW.A. Clark O. Selfridge

1979 - An Assessment of AI from a Lincoln Laboratory Perspective Internal MIT LL publication

J. Forgieand

J. Allen

1960 - Recognizing hand-written characters, Robert Larson of SRI AI Center

1961 - James Slagle, Solving Freshman Calculus (Minsky Student) MIT

2014 - Google’s GoogleNet Object classification at near human performance

2005 - Google’s Arabic and Chinese to English translation

1994 - Human-level spontaneous speech recognition

1989 -Convolutional Neural Networks

1984 - Hidden Markov models

2016 - DARPA Cyber Grand Challenge

J. McCarthy, M. Minsky,N. Rochester, O. Selfridge,C. Shannon, others

1957 - Memory Test Computer, first computer to simulate the operation of neural networks

1950 2000 2010 20171960 1970 1980 1990

1959 - Arthur Samuel “Some studies in machine learning using the Game of checkers” IBM Journal of R&D

Programming Pattern

Recognition(MIT LL Staff)

Pattern Recognition and Modern Computers

(MIT LL Staff)

Generalization of Pattern Recognition in a Self-Organizing System

(MIT LL Staff)

1957 - Frank Rosenblatt Neural Networks Perceiving and Recognizing Automation

1988 - Statistical Machine Translation

AI Winters 1974–1980 and 1987–1993

2001–presentThe availability of very large data sets

1982 - Expert Systems Pioneer DENDRAL project at Stanford

Page 4: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

4

AI and ML - 9VNG 010720

Artificial Intelligence Evolution

* Waves adapted from John

Launchbury, Director I2O,

DARPA

REASONING* LEARNING CONTEXT ABSTRACTION

HandcraftedKnowledge

ContextualAdaptation

System Evolution

Four Waves of AI

PerceivingLearning

AbstractingReasoning

PerceivingLearning

AbstractingReasoning

PerceivingLearning

AbstractingReasoning

PerceivingLearning

AbstractingReasoning

StatisticalLearning

Lots of data enabled

non-expert systems

Adding context to AI systems

Ability of system to abstract

VNG 010720

Spectrum of Commercial Organizations in the Machine Intelligence Field

AI and ML - 10 Source: Shivon Zilis, 2016, https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-3-0

© Shivon Zilis and James Cham, designed by Heidi Skinner. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

Page 5: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

5

AI and ML - 11

VNG 010720

Data is Critical To Breakthroughs in AI

Year Breakthroughs in AI Datasets (First Available) Algorithms (First Proposed)

1994 Human-level read-speech recognition Spoken Wall Street Journal articles and

other texts (1991)

Hidden Markov Model (1984)

1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka

“The Extended Book” (1991)

Negascout planning algorithm

(1983)

2005 Google’s Arabic- and Chinese-to-English

translation

1.8 trillion tokens from Google Web and

News pages (collected in 2005)

Statistical machine translation

algorithm (1988)

2011 IBM Watson became the world Jeopardy!

champion

8.6 million documents from Wikipedia,

Wiktionary, Wikiquote, and Project

Gutenberg (updated in 2010)

Mixture-of-Experts algorithm (1991)

2014 Google’s GoogleNet object classification

at near-human performance

ImageNet corpus of 1.5 million labeled

images and 1,000 object categories (2010)

Convolutional neural network

algorithm (1989)

2015 Google’s Deepmind achieved human

parity in playing 29 Atari games by

learning general control from video

Arcade Learning Environment dataset of

over 50 Atari games (2013)

Q-learning algorithm (1992)

Average No. of Years to Breakthrough: 3 years 18 years

Source: Train AI 2017, https://www.crowdflower.com/train-ai/

AI and ML - 12VNG 010720

AI Canonical Architecture

Data Conditioning

StructuredData

UnstructuredData

Sensors

Sources

Information Knowledge Insight

Users (Missions)

Robust AI

Explainable AI Metrics and Bias Assessment Verification & Validation Security

(e.g., counter AI)Policy, Ethics, Safety and

Training

Algorithmse.g.:

Human- Machine Teaming (CoA)

Spectrum

• Knowledge-Based• Unsupervised

and SupervisedLearning

• Transfer learning

• ReinforcementLearning

• etc

HumanHuman-Machine ComplementMachine

Modern Computing

CPUs GPUs QuantumCustom . . . TPU Neuromorphic

GPU = Graphics Processing Unit TPU = Tensor Processing Unit

CoA = Courses of Action

Page 6: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

6

AI and ML - 13VNG 010720

Unstructured and Structured Data

Data Conditioning/Storage Technologies- Data to Information -

Infrastructure/Databases • Indexing/Organization/Structure• Domain Specific Languages• High Performance Data Access• Declarative Interfaces

Data Curation • Unsupervised machine learning • Dimensionality Reduction• Clustering/Pattern Recognition• Outlier Detection

Data Labeling • Initial data exploration• Highlight missing or incomplete data• Reorient sensors/recapture data• Look for errors/biases in collection

Often takes up 80+% of overall AI/ML development work

Robust AI

Data Conditioning Algorithms

Modern Computing

Human-Machine Teaming

Data Conditioning

Technologies Capabilities Provided

Speech Network Logs

Metadata

SocialMedia

Human Behavior

Side Channel

Structured Data Types

Unstructured Data Types

Sensors

Reports

AI and ML - 14

VNG 010720

Machine Learning Algorithms Taxonomy

Robust AI

Data Conditioning

Modern Computing

Human-Machine Teaming

Data Conditioning Algorithms

Alg

orith

ms*

Analogizers(e.g., SVM)

Connectionists(e.g., DNN)

Bayesians(e.g., naive Bayes)

* ”The Five Tribes of Machine

Learning”, Pedro Domingos

Symbolists(e.g., exp. sys.)

DNN = Deep Neural Networks

SVM = Support Vector Machines

Exp. Sys. = Expert Systems

Evolutionaries(e.g., genetic programming)

Image Adapted From “Deep Learning” by Ian

Goodfellow, Yoshua Bengio and Aaron Courville

Artificial Intelligence

Machine Learning

Neural Nets

Deep Neural Nets

Page 7: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

7

AI and ML - 15VNG 010720

Com

putin

g C

lass

Modern AI Computing Engines

Robust AI

Data Conditioning Algorithms

Human-Machine Teaming

Data Conditioning

Modern Computing

GPU

CPU

TPU

• Most popular computing platform

• General purpose compute

• Used by most for training algorithms (good for NN backpropagation)

• Speeds up inference time(domain specific architecture)

Selected Results

SpGEMM Performance using Graph Processor (G102)

Trav

erse

d Ed

ges

/ Sec

ond

Quantum• Benefits unproven until now• Recent results on HHL

(linear system of equations)

Neuromorphic

Custom

• Active research area

• Ability to speed up specific computations of interest (e.g. graphs)

What It Provides to AI

1E+7

1E+8

1E+9

1E+10

1E+11

1E+12

1E+13

1E+14

1E+1 1E+2 1E+3 1E+4 1E+5 1E+6 1E+7 1E+8 1E+9

Traversed Edges Per Second

Watts

ASIC Graph Processor (Projected)

FPGA Graph Processor (Measured)

Cray XK7 Titan (Measured)

Cray XT4 Franklin (Measured)

Data CenterApplications

Embedded Applications

8 NodesMini-Chassis

64 NodesChassis

256 Nodes Rack

1024 Nodes 4 Racks

4k Nodes 16 Racks

16k Nodes, 64 Racks

1014

1013

1012

1011

1010

109

108

107

101 102 103 104 105 106 107 108 109

Alexnet comparison: Forward-Backward Pass

GPU = Graphics Processing Unit TPU = Tensor Processing Unit

•••

HHL = Harrow-Hassidim-Lloyd quantum algorithm

AI and ML - 16VNG 010720

DGX-1

Neural Network Processing Performance

MIT Eyeriss

MovidiusXJetsonTX1

JetsonTX2

Xavier

DGX-Station

DGX-2

WaveSystem

WaveDPU

TrueNorthSys

GraphCoreNode

GraphCoreC2K80

P100

V100

2xSkyLakeSPPhi7210F

Phi7290F

Arria

Nervana

Peak Power (W)

Peak

G

Ops

/Sec

ond

1 TeraOps/W10 TeraOps/W

100 GigaOps/W

Int8Int16Float16Float16 -> Float32Float32Float64

Computation Precision

ChipCardSystem

Form Factor

InferenceTraining

Computation Type

GoyaTPU3

TPU1TPU2

Legend

Turing

TPUEdge

TrueNorth

Reuther, Albert, et al. "Survey and Benchmarking of Machine Learning Accelerators." arXiv preprint arXiv:1908.11348 (2019).

© IEEE. Figure 2 in Reuther, A., et al, "Survey and Benchmarking of Machine Learning Accelerators," 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2019, pp. 1-9, doi: 10.1109/HPEC.2019.8916327. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

Page 8: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

8

AI and ML - 17VNG 010720

Robust AI: Preserving Trust

Data Conditioning Algorithms

Modern Computing

Human-Machine Teaming

Data Conditioning

Robust AI

Confidence Level vs. Consequence of Actions

Consequence of Actions

Conf

iden

ce L

evel

in

the

Mac

hine

Mak

ing

the

Deci

sion

Low

High

Low High

Machines Augmenting

Humans

Best Matched toMachines

Best Matched to

Humans

AI and ML - 18VNG 010720

Importance of Robust AI

System vulnerable to adversarial action (both cyber and physical)

Unknown relationship between arbitrary input and machine output

IssueRobust AI Feature SolutionsExample

Model failure detection, red teaming

Explainability, dimensionality reduction, feature importance inference

Unwanted actions when controlling heavy or dangerous machinery

Risk sensitivity, robust inference, high decision thresholds

User unfamiliarity or mistrust leads to lack of adoption

Seamless integration, model expansion, transparent uncertainty

Algorithms need to meet mission specifications

Robust training, “portfolio” methods, regularization

Security

Metrics

Policy, Ethics, Safety, and Training

Explainable AI

Validation & Verification

Page 9: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

9

AI and ML - 19VNG 010720

Robust AI

Data Conditioning Algorithms Data Conditioning

Modern Computing

Human-Machine Teaming

Knowledge Insight

Human-Machine Teaming (CoA)

Spectrum

HumanHuman-Machine ComplementMachine

Human-Machine Teaming

Human-Machine teaming will consist of intelligent assistants enabled by artificial intelligence

Critical Element of AI: Understanding how humans and machines can work together for applications

Confidence Level vs. Consequence of Actions

Consequence of Actions

Con

fiden

ce

Leve

l

Low High

Machines Augmenting

Humans

Best Matched toMachines

Best Matched to

Humans

Scale vs. Application Complexity

Application Complexity

Scal

e

Low

High

Low High

Machines Augmenting

Humans

Machines More Effective than

Humans

Humans More Effective than

Machines

AI and ML - 20VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

Page 10: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

10

AI and ML - 21VNG 010720

What is Machine Learning?

• Machine Learning– Study of algorithms that improve their performance at some task with experience (data)– Optimize based on performance criterion using example data or past experience

• Combination of techniques from statistics, computer science communities

• Getting computers to program themselves• Common tasks:

– Classification– Regression– Prediction– Clustering– …

AI and ML - 22VNG 010720

Traditional Programming

Machine Learning

ComputerData

ProgramOutput

ComputerData

OutputProgram

Traditional Programming vs. Machine Learning

Source: Pedro Domingos, University of Washington

Page 11: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

11

AI and ML - 23VNG 010720

Supervised(Labels)

Unsupervised(No Labels)

Reinforcement(Reward Information)

Machine Learning Techniques

AI and ML - 24VNG 010720

Supervised(Labels)

Unsupervised(No Labels)

Reinforcement(Reward Information)

Clustering

DimensionalityReduction

Regression

Classification

Machine Learning Techniques

Page 12: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

12

AI and ML - 25VNG 010720

Supervised(Labels)

Regression

Classification

Unsupervised(No Labels)

DimensionalityReduction

Reinforcement(Reward Information)

Clustering

SVMEnsembles

LinearRegression

LogisticRegression

K-Means

SpectralMethods

NMFTree based techniques

NaïveBayes

PCAQ-learning

K Nearest neighbors

SARSA

Non-LinearRegression

LASSO

Markov Decision

Processes

Neural Nets

Machine Learning Techniques

KernelPCA

Hierarchical Methods

AI and ML - 26VNG 010720

Supervised(Labels)

Regression

Classification

Unsupervised(No Labels)

DimensionalityReduction

Reinforcement(Reward Information)

Clustering

SVM

Ensembles

(Multiple) Linear

Regression

LogisticRegression

K-Means

SpectralMethods

NMFTree based techniques

NaïveBayes

PCAQ-learning

K Nearest neighbors

SARSA

Non-LinearRegression

Markov Decision

Processes

Neural Nets

LASSO

Wrapper Methods

Filter Methods

Structure Learning

Embedded Methods

HMM

GMM

Machine Learning Techniques

KernelPCA

Hierarchical Methods

Page 13: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

13

AI and ML - 27VNG 010720

Common ML Pitfalls

• Over-fitting vs. Under-fitting• Bad/noisy/missing data• Model selection• Lack of success metrics• Linear vs. Non-linear models• Ignoring outliers• Training vs. testing data• Computational complexity, curse of dimensionality • Correlation vs. Causation

AI and ML - 28VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

Page 14: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

14

AI and ML - 29VNG 010720

Supervised Learning

• Starting with labeled data (ground truth)

• Build a model that predicts labels• Two general goals:

– Regression: predict continuous variable– Classification: predict a class or label

• Generally has a training step that forms the model

Training Data

Data Labels

Supervised Learning

Model

TrainModel

Test Data

DataPredicted

Labels

AI and ML - 30VNG 010720

Artificial Neural Networks

• Computing systems inspired by biological networks• Systems learn by repetitive training to do tasks based on

examples– Generally a supervised learning technique (though

unsupervised examples exist)

• Components: Inputs, Layers, Outputs, Weights

• Deep Neural Network: Lots of “hidden layers”

• Popular variants:– Convolutional Neural Nets– Recursive Neural Nets– Deep Belief Networks

• Very popular these days with many toolboxes and hardware support

Page 15: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

15

AI and ML - 31VNG 010720

Deep Neural Networks

W0

b0W1

b1W2

b2y0 y3

yi+1 = f(Wi yi + bi)

Image source: http://www.rsipvision.com/exploring-deep-learning/

AI and ML - 32VNG 010720

w1

w2

w3

f(x)

1.4

2.5

-0.06

Neuron/Node

Inputs

Connections/Weights

ActivationFunction

Components of an Artificial Neural Network

Image source: https://www.cs.drexel.edu/~greenie/cs510/CS510-17-08.pdf

© Drexel University. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

© RSIP. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

Page 16: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

16

AI and ML - 33VNG 010720

f(x)

1.4

2.5

-0.06

x = -0.06�2.7 + 2.5�8.6 + 1.4�0.002 = 21.34f(x) = 1

Components of an Artificial Neural Network

w1=2.7

w2 =8.6

w3 = 0.002

~1

Image source: https://www.cs.drexel.edu/~greenie/cs510/CS510-17-08.pdf

© Drexel University. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

AI and ML - 34VNG 010720

• Step Function: ! " = $%, " < %(, " ≥ %

• Sigmoid Function: ! " = ((*+,"

• Tanh Function: ! " = -./0(")

• Rectified Linear Unit (ReLU): ! " = 3."(%, ")

Common Activation Functions

Page 17: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

17

AI and ML - 35

VNG 010720

Image Source: http://www.asimovinstitute.org/neural-network-zoo/

Neural Network Landscape

© Fjodor van Veen and Stefan Leijnen. All rights reserved. This content is excluded from our

Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

AI and ML - 36VNG 010720

Key Idea: Adjusting the weights changes the function represented by the neural network (learning = optimization in weight space).

Iteratively adjust weights to reduce error (difference between network output and target output)

Weight Update– perceptron training rule– linear programming– delta rule– Backpropagation

Real neural network architectures can have 1000s of input data points, hundreds of layers and millions of weight changes per iteration

Neural Network Training

Page 18: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

18

AI and ML - 37VNG 010720

• Using the trained model on previously “unlabeled” data

Neural Network Inference

AI and ML - 38VNG 010720

Neural Network Learning: Decision Boundary

Page 19: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

19

AI and ML - 39VNG 010720

• Designing a neural network can be a complicated task.• Many choices:

Designing a Neural Network

• Types of layers:– MaxPool– Dropout– Convolutional– Deconvolutional– Softmax– Fully Connected– Skip Layer– …

• Depth (number of layers)• Inputs (number of inputs)

• Type of Network:– Convolutional Neural Network– Deep Feedforward Neural

Network– Deep Belief Network– Long/Short Term Memory– …

• Training Algorithm– Performance vs. Quality– Stopping criteria– Performance function

• Metrics:– False positive– ROC curve– …

AI and ML - 40VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

Page 20: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

20

AI and ML - 41VNG 010720

Unsupervised Learning

• Task of describing hidden structure from unlabeled data

• More formally, we observe features X1, X2,…,Xn and would like to observe patternsamong these features.• We are not interested in predcition because we don’t know what an output Y would look like.

• Typical tasks and associated algorithms:• Clustering• Data projection/Preprocessing

• Goal is to discover interesting things about the dataset: subgroups, patterns,clusters?

AI and ML - 42VNG 010720

More on Unsupervised Learning

• There is no good simple goal (such as maximizing certain probability) for the algorithm• Very popular because techniques work on unlabeled data

– Labeled data can be difficult and expensive

• Common techniques:– Clustering

• K-Means• Nearest neighbor search• Spectral clustering

– Data projection/preprocessing• Principal component analysis• Dimensionality Reduction• Scaling

Page 21: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

21

AI and ML - 43VNG 010720

Clustering

• Group objects or sets of features such thatobjects in the same cluster are more similarthan those of another cluster

• Optimal clusters should– Minimize intra-cluster distance– Maximize inter-cluster distance

• Example of intra-cluster measure• Squared error se

where mi is the mean of all features in cluster ci

xx

xx

xx

xx

xåå= Î

-=k

i cpi

i

mpse1

2

AI and ML - 44VNG 010720

Dimensionality Reduction

• Process of reducing number of random variables under consideration– Key idea: Reduce large dataset to much smaller dataset using only high variance dimensions

• Often used to simplify computation or representation of a dataset

• Typical tasks:– Feature Selection: try to find a subset of original variables

– Feature Extraction: try to represent data in lower dimensions

• Often key to good performance for other machine learning techniques such asregression, classification, etc.

• Other uses:– Compression: reduce dataset to smaller representation– Visualization: easy to see low dimensional data

Page 22: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

22

AI and ML - 45VNG 010720

• Traditional applications of neural networks such as Image classification fall into therealm of supervised learning:– Given example inputs x and target output y, learn the mapping between them.– A trained network is supposed to give the correct target output for any input stimulus– Training is learning the weights

• Largely used to find better representations for data: clustering and dimensionalityreduction

• Non linear capabilities

Neural Networks and Unsupervised Learning

AI and ML - 46VNG 010720

• Neural network architecture designed to find a compressed representation for data• Feedforward, multi layer perceptron.• Input layer number of features = output layer number of features• Similar to dimensionality reduction but allows for much more complex

representations

Example: Autoencoders

Encoder Decoder

Inpu

ts

Reconstructed

Input

Compressed Representation

© Leonardo Araujo dos Santos. All rights reserved. This content is excluded from our Creative Commons license. For more

information, see https://ocw.mit.edu/help/faq-fair-use/

Page 23: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

23

AI and ML - 47VNG 010720

• Conceptually, very similar to autoencoders• Used extensively for anomaly detection (looking for outliers)

• Example architecture

• Salient differences from an autoencoder: Step ActivationFunction, Inclusion of dropout layers– Step activation squeezes the middle layer outputs into a number of

clusters– Dropout layers help with overfitting

Example: Replicator Neural Network

Step Activation Function

Hawkins, Simon, et al. "Outlier detection using replicator neural networks." International Conference on Data Warehousing and Knowledge Discovery. Springer, Berlin, Heidelberg, 2002.

AI and ML - 48VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

Page 24: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

24

AI and ML - 49VNG 010720

Reinforcement Learning

• What makes reinforcement learning differentfrom other machine learning paradigms?– There is no supervisor, only a reward signal– Feedback is delayed, not instantaneous– Time really matters (sequential, often inter-

dependent data)– Agent’s actions affect the subsequent data it

receives

• Example: Playing Atari game– Rules of the game are unknown– Learn directly from interactive game-play– Pick actions on joystick, see pixels and scores

Lecture 1: Introduction to Reinforcement Learning

Problems within RL

Atari Example: Reinforcement Learning

observation

reward

action

At

Rt

OtRules of the game areunknown

Learn directly frominteractive game-play

Pick actions onjoystick, see pixelsand scores

Image Source: “Introduction to Reinforcement Learning” by David Silver (Google DeepMind)

AI and ML - 50VNG 010720

Other Reinforcement Learning Examples

• Fly stunt maneuvers in a helicopter– + reward for following desired trajectory– − reward for crashing

• Defeat the world champion at Backgammon– +/− reward for winning/losing a game

• Manage an investment portfolio– + reward for each $ in bank

• Control a power station– + reward for producing power– − reward for exceeding safety thresholds

• Make a humanoid robot walk– + reward for forward motion– − reward for falling over

© David Silver. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

Page 25: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

25

AI and ML - 51VNG 010720

Outline

• Artificial Intelligence Overview• Machine Learning Deep Dives

– Supervised Learning– Unsupervised Learning– Reinforcement Learning

• Conclusions/Summary

AI and ML - 52VNG 010720

Summary

• Lots of exciting research into AI/ML techniques– This course looks at a number of relatively easy strategies to mitigate these challenges

• Key ingredients for AI success:– Data Availablity– Computing Infrastructure– Domain Expertise/Algorithms

• Large challenges in data availability and readiness for AI• MIT SuperCloud platform (next presentation) can be used to perform the heavy

computation needed

• Further reading:– "AI Enabling Technologies: A Survey.” https://arxiv.org/abs/1905.03592

Page 26: Artificial Intelligence and Machine Learning...MIND vol. LIX 2012 -Team from U. of Toronto (Geoff Hinton's lab) wins the ImageNet Large Scale Visual Recognition Challenge with deep-learning

MIT OpenCourseWare https://ocw.mit.edu/

RES.LL-005 Mathematics of Big Data and Machine Learning IAP 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.