Philosophy of Deep Learning

Melanie SwanPhilosophy & Economic Theory

New School for Social Research, NY [email protected]

Pfizer, New York NY, March 30, 2017Slides: http://slideshare.net/LaBlogga

Philosophy of Deep Learning: Deep Qualia, Statistics, and Blockchain

Image credit: Nvidia

30 Mar 2017Deep Learning

ASA P value misuse statement

2Source: http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503, http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108

ASA principles to guide P value use The P value alone cannot determine whether a

hypothesis is true or whether results are important

30 Mar 2017Deep Learning 3

Melanie Swan Philosophy and Economic Theory, New School

for Social Research, New York NY Founder, Institute for Blockchain Studies Singularity University Instructor; Institute for Ethics and

Emerging Technology Affiliate Scholar; EDGE Essayist; FQXi Advisor

Traditional Markets BackgroundEconomics and Financial

Theory Leadership

New Economies research group

Source: http://www.melanieswan.com, http://blockchainstudies.org/NSNE.pdf, http://blockchainstudies.org/Metaphilosophy_CFP.pdf

https://www.facebook.com/groups/NewEconomies


Deep Learning vocabularyWhat do these terms mean?

Deep Learning, Machine Learning, Artificial Intelligence Deep Belief Net Perceptron, Artificial Neuron MLP/RELU: Multilayer Perceptron Artificial Neural Net TensorFlow, Caffe, Theano, Torch, DL4J Recurrent Neural Nets Boltzmann Machine, Feedforward Neural Net Open Source Deep Learning Frameworks Google DeepDream, Google Brain, Google DeepMind

4


Key take-aways

1. What is deep learning? Advanced statistical method using logistic regression Deep learning is a sub-field of machine learning and

artificial intelligence

2. Why is deep learning important? Crucial method of algorithmic data manipulation

3. What do I need to know (as a data scientist)? Awareness of new methods like deep learning needed to

keep pace with data growth

5


Deep Learning and Data Science

6

Not optional: older algorithms cannot perform to generate requisite insights

Source: http://blog.algorithmia.com/introduction-to-deep-learning-2016


Agenda Deep Learning Basics

Definition, operation, drawbacks Implications of Deep Learning

Deep Learning and the Brain Deep Learning Blockchain Networks Philosophy of Deep Learning

7Image Source: http://www.opennn.net


Deep Learning Context

8Source: Machine Learning Guide, 9. Deep Learning


Deep Learning Definition“machines that learn to represent the world” – Yann LeCun

Deep learning is a class of machine learning algorithms that use a cascade of layers of processing units to extract features from data Each layer uses the output from the previous layer as input

Two kinds of learning algorithms Supervised (classify labeled data) Unsupervised (find patterns in unlabeled data)

Two phases: training (existing data) and test (new data)

9Source: Wikiepdia, http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/facebook-ai-director-yann-lecun-on-deep-learning


What is Learning? When algorithms detect a system’s features or rules

10

Single-purpose AI: Deep Blue, 1997Hard-coded rules

Multi-purpose AI structure: AlphaGo, 2016 Algorithm-detected rules, reusable template

Deep Learning machine

General purpose AI: Deep Qualia, 2xxx?Novel situation problem-solving,

Algorithm edits/writes rules

Question-answering AI: Watson, 2011Natural-language processing

Deep Learning prototype


Deep Learning: what is the problem space?

11Source: Yann LeCun, CVPR 2015 keynote (Computer Vision ), "What's wrong with Deep Learning" http://t.co/nPFlPZzMEJ

Level 1 – basic application areas Image, text, speech recognition Multi-factor recognition (label image with text) Sentiment analysis

Level 2 – complex application areas Autonomous driving Disease diagnosis, tumor recognition, X-ray/MRI interpretation Seismic analysis (earthquake, energy, oil and gas)


Deep Learning TaxonomyHigh-level fundamentals of machine learning

12Source: Machine Learning Guide, 9. Deep Learning;

AI (artificial intelligence)

Machine learning Other methods

Supervised learning(labeled data: classification)

Unsupervised learning(unlabeled data: pattern

recognition)

Reinforcement learning

Shallow learning (1-2 layers)

Deep learning (5-20 layers (expensive))

Recurrent nets (text, speech)

Convolutional nets (images)

Neural Nets (NN) Other methods Bayesian inferenceSupport Vector Machines

Decision trees

K-means clustering

K-nearest neighbor


What is the problem? Computer Vision (and speech and text recognition)

13Source: Quoc Le, https://arxiv.org/abs/1112.6209; Yann LeCun, NIPS 2016, https://drive.google.com/file/d/0BxKBnD5y2M8NREZod0tVdW5FLTQ/view

Marv Minsky, 1966“summer project”

Jeff Hawkins, 2004, Hierarchical Temporal

Memory (HTM)

Quoc Le, 2011, Google Brain cat recognition

Yann LeCun, 2016, Predictive Learning,

Convolutional net for driving


Image Recognition: Basic Concept

14Source: https://developer.clarifai.com/modelshttps://developer.clarifai.com/models

How many orange pixels?

Apple or Orange? Melanoma risk or healthy skin?

Degree of contrast in photo colors?


Regression (review) Linear regression

Predict continuous set of values (house prices)

Logistic regression Predict binary outcomes (0,1)

15

Logistic regression (sigmoid function)

Linear regression


Deep Learning Architecture

16Source: Michael A. Nielsen, Neural Networks and Deep Learning


Example: Image recognition

1. Obtain training data set

2. Digitize pixels (convert images to numbers) Divide image into 28x28 grid, assign a value (0-255) to each

square based on brightness

3. Read into vector (array) (28x28 = 784 elements per image)

17Source: Quoc V. Le, A Tutorial on Deep Learning, Part 1: Nonlinear Classifiers and The Backpropagation Algorithm, 2015, Google Brain, https://cs.stanford.edu/~quocle/tutorial1.pdf



4. Load spreadsheet of vectors into deep learning system Each row of spreadsheet is an input

18Source: http://deeplearning.stanford.edu/tutorial; MNIST dataset: http://yann.lecun.com/exdb/mnist

1. Input 2. Hidden layers 3. Output

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Vector data


What happens in the Hidden Layers?


First layer learns primitive features (line, edge, tiniest unit of sound) by finding combinations of the input vector data that occur more frequently than by chance Logistic regression performed and encoded at each processing

node (Y/N (0,1)), does this example have this feature? Feeds these basic features to next layer, which trains

itself to recognize slightly more complicated features (corner, combination of speech sounds)

Feeds features to new layers until recognizes full objects


Feature Recognition in the Hidden Layers

20Source: Jann LeCun, http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf


What happens in the Hidden Layers?

21Source: Nvidia

First hidden layer extracts all possible low-level features from data (lines, edges, contours), next layers abstract into more complex features of possible relevance


Deep Learning Core concept:

Deep Learning systems learn increasingly complex features

22Source: Andrew Ng


Deep Learning Google Deep Brain recognizes cats

23Source: Quoc V. Le et al, Building high-level features using large scale unsupervised learning, 2011, https://arxiv.org/abs/1112.6209




1. Input 2. Hidden layers 3. Output guess(0,1)


Deep Learning MathTest new data after system iterates

25


X

X

X

X

X

X

X

X

X

X

X

X

X

X

XSource: http://deeplearning.stanford.edu/tutorial; MNIST dataset: http://yann.lecun.com/exdb/mnist

Linear algebra: matrix multiplications of input vectors Statistics: logistic regression units (Y/N (0,1)), probability

weighting and updating, inference for outcome prediction Calculus: optimization (minimization), gradient descent in

back-propagation to avoid local minima with saddle points

Feed-forward pass(0,1)

0.5

Back-propagation pass; update probabilities

.5.5

.5.5.5

0

01

.75

.25

Inference Guess

Actual


Hidden Layer Unit, Perceptron, Neuron

26Source: http://deeplearning.stanford.edu/tutorial; MNIST dataset: http://yann.lecun.com/exdb/mnist


X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Unit (processing unit, logistic regression unit), perceptron (“multilayer perceptron”), artificial neuron


Kinds of Deep Learning SystemsWhat Deep Learning net to choose?

27Source: Yann LeCun, CVPR 2015 keynote (Computer Vision ), "What's wrong with Deep Learning" http://t.co/nPFlPZzMEJ

Supervised algorithms (classify labeled data) Image (object) recognition

Convolutional net (image processing), deep belief network, recursive neural tensor network

Text analysis (name recognition, sentiment analysis)

Recurrent net (iteration; character level text), recursive neural tensor network

Speech recognition Recurrent net

Unsupervised algorithms (find patterns in unlabeled data) Boltzmann machine or autoencoder


AdvancedDeep Learning Architectures

28Source: http://prog3.com/sbdm/blog/zouxy09/article/details/8781396

Deep Belief Network Connections between layers not units Establish weighting guesses for

processing units before run deep learning system

Used to pre-train systems to assign initial probability weights (more efficient)

Deep Boltzmann Machine Stochastic recurrent neural network Runs learning on internal

representations Represent and solve combinatoric

problemsDeep

Boltzmann Machine

Deep Belief

Network


Convolutional net: Image Enhancement Google DeepDream: Convolutional neural network

enhances (potential) patterns in images; deliberately over-processing images

29Source: Georges Seurat, Un dimanche après-midi à l'Île de la Grande Jatte, 1884-1886; http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722; Google DeepDream uses algorithmic pareidolia (seeing an image when none is present) to create a dream-like hallucinogenic appearance


How big are Deep Learning systems? Google Deep Brain cat recognition, 2011

1 billion connections, 10 million images (200x200 pixel), 1,000 machines (16,000 cores), 3 days, each instantiation of the network spanned 170 servers, 20,000 object categories

State of the art, 2015-2016 Nvidia facial recognition example, 2016, 100 million images,

10 layers, 18 parameters, 30 exaflops, 30 GPU days Google, 11.2-billion parameter system Lawrence Livermore Lab, 15-billion parameter system Digital Reasoning, 2015, cognitive computing (Nashville TN),

160 billion parameters, trained on three multi-core computers overnight

30Source: https://futurism.com/biggest-neural-network-ever-pushes-ai-deep-learning, Digital Reasoning paper: https://arxiv.org/pdf/1506.02338v3.pdf


Deep Learning, Deep Flaws? Even though now possible, still early days Expensive and inefficient, big systems

Only available to massive data processing operations (Google, Facebook, Microsoft, Baidu)

Black box: we don’t know how it works Reusable model but still can’t multi-task

Atari example: cannot learn multiple games Drop Asteroids to learn Frogger

Add common sense to intelligence Background information, reasoning, planning Memory (update and remember states of the world)

…Deep Learning is still a Specialty System

31

AlphaGo applied to

Atari games

Source: http://www.theverge.com/2016/10/10/13224930/ai-deep-learning-limitations-drawbacks


We had the math, what took so long? A) Hardware, software, processing

advances; and B) more data Key advances in hardware chips

GPU chips (graphics processing unit): 3D graphics cards designed to do fast matrix multiplication

Google TPU chip (tensor processing unit): custom ASICs for machine learning, used in AlphaGo

Training the amount of data required was too slow to be useful Now can train neural nets quickly, still

expensive

32

Tensor(Scalar (x,y,z), Vector (x,y,z)3, Tensor (x,y,z)9)

Google TPU chip (Tensor Processing Unit), 2016

Source: http://www.techradar.com/news/computing-components/processors/google-s-tensor-processing-unit-explained-this-is-what-the-future-of-computing-looks-like-1326915







Deep Learning and the Brain

34


Deep learning neural networks are inspired by the structure of the cerebral cortex The processing unit, perceptron, artificial neuron is the

mathematical representation of a biological neuron In the cerebral cortex, there can be several layers of

interconnected perceptrons

35

Deep Qualia machine? General purpose AIMutual inspiration of neurological and computing research


Deep Qualia machine? Visual cortex is hierarchical with intermediate layers

The ventral (recognition) pathway in the visual cortex has multiple stages: Retina - LGN - V1 - V2 - V4 - PIT – AIT

Human brain simulation projects Swiss Blue Brain project, European Human Brain Project

36Source: Jann LeCun, http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf







Deep Learning Blockchain Networks

38


Blockchain Technology

39Source: http://www.amazon.com/Bitcoin-Blueprint-New-World-Currency/dp/1491920491


What is Blockchain Technology? Blockchain technology is an Internet-

based ledger system for submitting, logging, and tracking transactions

Allows the secure transfer of assets (like money) and information, computationally, without a human intermediary Secure asset transfer protocol, like email First application is currency (Bitcoin) and

FinTech re-engineering, subsequent applications in algorithmic data processing

40Source: Blockchain Smartnetworks, https://www.slideshare.net/lablogga/blockchain-smartnetworks


Deep Learning Blockchain NetworksHelp resolve Deep Learning challenges

41Source: http://www.melanieswan.com, http://blockchainstudies.org/NSNE.pdf, http://blockchainstudies.org/Metaphilosophy_CFP.pdf

Deep Learning systems need greater capacity Put Deep Learning systems on the Internet in a secure-

trackable, remunerable way; distributed not parallel systems

Deep Learning systems need more complexity and side modules Instantiate common sense, memory, planning modules

Deep Learning systems do not reveal what happens in the hidden layers Track arbitrarily-many transactions with smart contracts

Core blockchain functionality employed Automated coordination of massive amounts of operations

via smart contracts (automatically-executing Internet-based programs)


Deep Learning systems go online with Blockchain

Key point is to put Deep Learning systems on the Internet

Blockchain is perfect technology to control secure access, yet have all of the 24/7 availability, flexibility, scale, and side modules needed

Provide global infrastructure to work on current problems Genomic disease, protein modeling,

financial risk assessment, astronomical data analysis

42


Combine Deep Learning and Blockchain Technology Deep learning technology, particularly coupled with blockchain

systems, might create a new kind of global computing platform

Deep Learning and Blockchains are similar Indicative of a shift toward having increasingly sophisticated

and automated computational tools Mode of operation of both is making (statistically-supported)

guesses about reality states of the world Predictive inference (deep learning) and cryptographic nonce-

guesses (blockchain) Current sense-making model of the world, we are guessing at more

complex forms of reality

43

Advanced Computational Infrastructure

Deep Learning Blockchain Networks







Philosophy of Deep Learning

45

30 Mar 2017Deep Learning 46

Human’s Role in the World is Changing

Sparse data we control Abundant data controls us?

Deep Learning is emphasizing the presence of Big Data


Philosophy of Deep Learning - Definition

47

The Philosophy of Deep Learning is the branch of philosophy concerned with the definition, methods, and implications of Deep Learning Internal Industry Practice

Internal to the field as a generalized articulation of the concepts, theory, and systems that comprise the overall use of deep learning algorithms

External Social Impact External to the field, considering the

impact of deep learning more broadly on individuals, society, and the world


3 Kinds of Philosophic Concerns Ontology (existence, reality)

What is it? What is deep learning? What does it mean?

Epistemology (knowledge) What knowledge are we gaining from

deep learning? What is the proof standard?

Axiology or Valorization (ethics, aesthetics) What is noticed, overlooked? What is ethical practice? What is beauty, elegance?

48Sources: http://www.melanieswan.com/documents/Philosophy_of_Big_Data_SWAN.pdf


Explanation: does the map fit the territory?

49

1626 map of “the Island of California”

Source: California Is An Island Off the Northerne Part of America; John Speed, "America," 1626, London

Explanandum What is being

explained Explanans

The explanation


How do we understand reality? Methods, models, and

tools

Descartes, Optics, 1637

Deep Learning, 2017

50







Key take-aways What is deep learning?

Advanced statistical method using logistic regression Deep learning is a sub-field of machine learning and

artificial intelligence Why is deep learning important?

Crucial method of algorithmic data manipulation What do I need to know (as a data scientist)?

Awareness of new methods like deep learning needed to keep pace with data growth

52


Conclusion Deep learning systems are machine

learning algorithms that learn increasingly complex feature sets from data via hidden layers

Deep qualia systems might be a step forward in brain simulation in computer networks and general intelligence

Next-generation global infrastructure: Deep Learning Blockchain Networks merging deep learning systems and blockchain technology

53


Resources

54

Distill, a visual, interactive journal for

machine learning research

http://distill.pub/

Melanie SwanPhilosophy & Economic Theory

New School for Social Research, NY [email protected]

Philosophy of Deep Learning: Deep Qualia, Statistics, and Blockchain

Pfizer, New York NY, March 30, 2017Slides: http://slideshare.net/LaBlogga

Thank You! Questions?

Image credit: Nvidia