Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
27: Intro to Deep Learning + Wrap-upLisa YanJune 8, 2020
1
Lisa Yan, CS109, 2020
Quick slide reference
2
3 0.1% of Deep Learning LIVE
39 Beyond the Basics LIVE
63 CS109 Wrap-Up LIVE
Deep Learning
3
LIVE
Lisa Yan, CS109, 2020
Logistic Regression Model
4
!! +#"#$
%!"$"! " # = 1|!
sigmoid function
! " = 11 + &!" 0
0.2
0.4
0.6
0.8
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
! "
"
+
!"#
" # = 1|! = '
arg max'( ),+
" # | !
Not actually part of modelingonly part of prediction
(so ignore for now)
Lisa Yan, CS109, 2020
Big ideas from Logistic Regression
5
!! +#"#$
%!"$"! -" # = 1|!
sigmoid function
! " = 11 + &!" 0
0.2
0.4
0.6
0.8
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
! "
"
+
…
σ
!
"#" # = 1|! = '
> 0.5?
Lisa Yan, CS109, 2020
One neuron
6
+
…
σ
!
"#=
= One logistic regression
Lisa Yan, CS109, 2020
Neural network =many logisticregressions
Biological basis for neural networks
7
A neuron
Your brain
One neuron =one logisticregression
(actually, probably someone else’s brain)
x1x2x3x4
θ1
θ2θ3θ4
y
x1x2x3x4
Lisa Yan, CS109, 2020
Innovations in deep learning
Deep learning (neural networks) is the core idea behind the current revolution in AI.
8AlphaGO (2016)
Lisa Yan, CS109, 2020
Computers making art
9
Lisa Yan, CS109, 2020
Detecting skin cancer
10
Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115-118.
Lisa Yan, CS109, 2020
Deep learning
def Deep learning is maximum likelihood estimation with neural networks.
11
[1,0, … , 1]!, input
"#, output* +|- = !
> 0.5? Yes.Predict 1
Lots of Logistic (regressions)
LOL
def A neural network is(at its core) many logistic regression pieces stacked on top of each other.
Lisa Yan, CS109, 2020
Digit recognition example
12
Input feature vector
'(-) = 0,0,0,0, … , 1,0,0,1, … , 0,0,1,0
'(-) = 0,0,1,1, … , 0,1,1,0, … , 0,1,0,0
1 - = 0
1 - = 1
Output labelInput image
We make feature vectors from (digitized) pictures of numbers.
Lisa Yan, CS109, 2020
Logistic Regression
13
…
!, input features
"#, output+ σ
!" = σ θ!&(pixels, on/off)
" # = 1|! = '
Lisa Yan, CS109, 2020
Logistic Regression
14
…
> 0.5?
!, input features
"#, output
!" = σ θ!&
indicates logistic regression connection No.
Predict 0
(pixels, on/off)
" # = 1|! = '
Lisa Yan, CS109, 2020
Logistic Regression
15
…
> 0.5?
!, input features
"#, output
!" = σ θ!&
indicates logistic regression connection Yes.
Predict 1
Lisa Yan, CS109, 2020
Logistic Regression
16
…
> 0.5?
!, input features
"#, output
!" = σ θ!&
indicates logistic regression connection Yes.
Predict 1
What can we do to increasecomplexity of our model?
Lisa Yan, CS109, 2020
Big ideas from logistic regression
17
…
!, input features
"#, output
!" = σ θ!&
indicates logistic regression connection
Big idea #1 " #|! = 'Model conditional probability 21 of class label given input
Big idea #2 σ θ/'Non-linear transform of multiple values into one value, using parameter θ
Lisa Yan, CS109, 2020
Neural network
18
…
!, input features
"#, output
$, hiddenlayer
> 0.5?No. Predict 0
Lisa Yan, CS109, 2020
Neural network
19
…
!, input features
"#, output
Big idea #1 " #|! = 'Model conditional probability 21 of class label given input
$, hiddenlayer
> 0.5?No. Predict 0
Lisa Yan, CS109, 2020
Feed neurons into other neurons
20
…
$, hiddenlayer
+ σ
hiddenneuron
• Neuron = logistic regression
!, input features
"#, output
> 0.5?No. Predict 0
Big idea #2 σ θ/'Non-linear transform of multiple values into one value, using parameter θ
Lisa Yan, CS109, 2020
Feed neurons into other neurons
21
…
$, hiddenlayer
+ σ
anotherhiddenneuron
• Neuron = logistic regression• Different parameters for
every connection!, input features
"#, output
> 0.5?No. Predict 0
Lisa Yan, CS109, 2020
Feed neurons into other neurons
22
…
!, input features
"#, output
$, hiddenlayer
|5| logistic regression
connections
• Neuron = logistic regression• Different parameters for
every connection
> 0.5?No. Predict 0
' ⋅ |5|parameters
Lisa Yan, CS109, 2020
Feed neurons into other neurons
23
…
$, hiddenlayer
> 0.5?No. Predict 0+ σ
outputneuron
• Neuron = logistic regression• Different parameters for
every connection!, input features
"#, output
|5| logistic regression
connections
' ⋅ |5|parameters
Lisa Yan, CS109, 2020
Feed neurons into other neurons
24
…
!, input features
"#, output
$, hiddenlayer
> 0.5?No. Predict 0
1 logistic regression connection
|5| logistic regression
connections
7 ⋅ |5|parameters
|5|parameters
• Neural networks are simplycomposed logistic regression units.
• 1 neuron = 1 logistic regression.• Each neuron has its own set of
parameters.
Lisa Yan, CS109, 2020
Demonstration
http://scs.ryerson.ca/~aharley/vis/conv/
25
Lisa Yan, CS109, 2020
Neural networks
26
Testing/Prediction
Training
A neural network (like logistic regression) gets intelligence from its parameters &.
• Learn parameters &• Find &%&' that maximizes likelihood of
training data (MLE)
For input feature vector ' = !:• Use parameters to compute "# = ) * = 1|' = !• Classify instance as: -1 "# > 0.5
0 otherwise
Lisa Yan, CS109, 2020
Neural networks
27
Training
A neural network (like logistic regression) gets intelligence from its parameters &.
• Learn parameters &• Find &%&' that maximizes likelihood of
training data (MLE)
How do we learn the ! ⋅ 0 + |0| parameters?Gradient ascent + chain rule!
Lisa Yan, CS109, 2020
1. Optimizationproblem:
2. Compute gradient
3. Optimize
Training Logistic Regression
28
899 :
8:0=;
-(+
11(-) − 21 =0
(-)
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
Review
21 = C :/'(-)
initialize paramsrepeat many times:
compute gradientparams += η * gradient
Lisa Yan, CS109, 2020
1. Optimizationproblem:
2. Compute gradient
3. Optimize
Training a neural net
29
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
Lisa Yan, CS109, 2020
1. Same output !", same conditional log-likelihood
30
…
!, input features
"#, output
$, hiddenlayer
) * = #|' = ! = -"# if # = 11 − "# if # = 0
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
9 : =>-(+
1" # = 1 - |! = ' - , :
=>-(+
121 - ' !
1 − 21 - +6' !
Lisa Yan, CS109, 2020
1. (model is a little more complicated, though)
31
…
!, input features
"#, output
$, hiddenlayer
( ) = 1|- = .
"# = 3 & /0 1$for D = 1,… , |5|
) * = #|' = ! = -"# if # = 11 − "# if # = 0
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
ℎ2 = 3 &2 31!
5 param vectors, each with dimension ' 5 -dimensional param vector
Lisa Yan, CS109, 2020
1. Optimizationproblem:
2. Compute gradient
3. Optimize
2. Compute gradient
32
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
ℎ0 = C :07 /
' 21 = C : 8' /5for D = 1,… , |5|
Calculus refresher #1:Derivative(sum) =
sum(derivative)
Calculus refresher #2:Chain rule !!!
Take gradient with respect to all : parameters
Lisa Yan, CS109, 2020
ℎ0 = C :07 /
'
1. Optimizationproblem:
2. Compute gradient
3. Optimize
3. Optimize
33
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
initialize paramsrepeat many times:
compute gradientparams += η * gradient
21 = C : 8' /5for D = 1,… , |5|
Take gradient with respect to all : parameters
Lisa Yan, CS109, 2020
ℎ0 = C :07 /
' 21 = C : 8' /5for D = 1,… , |5|
Take gradient with respect to all : parameters
1. Optimizationproblem:
2. Compute gradient
3. Optimize
Training a neural net
34
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
initialize paramsrepeat many times:
compute gradientparams += η * gradient
Wait, did we just skip something difficult?
Lisa Yan, CS109, 2020
1. Optimizationproblem:
2. Compute gradient
3. Optimize
2. Compute gradient via backpropagation
35
:234 = arg max5
>-(+
1? 1 - | ' - , : = arg max
599 :
99 : =;-(+
11(-) log 21 - + 1 − 1(-) log 1 − 21 -
initialize paramsrepeat many times:
compute gradientparams += η * gradient
Learn the tricks behind backpropagation in CS229, CS231N, CS224N, etc.
ℎ0 = C :07 /
' 21 = C : 8' /5for D = 1,… , |5|
Take gradient with respect to all : parameters
Interlude for jokes
36
Lisa Yan, CS109, 2020
Probability as college students
37(Note: not a valid PDF, but a useful construct nonetheless)
Lisa Yan, CS109, 2020
All remaining jokes
38
Knock knock. Whose there. Dishes. Dishes who? Dishes a great class.
Tell a joke about pizza! Or wait that might be too cheesy ;) Why do you avoid the top of
the bell curve? Because everyone there is mean.
Why is your nose in the middle of your face? Because it's the scenter :)
Two random variables were talking in a bar. They thought they were being discreet but I heard their chatter continuously.
What happens to a frog’s car when it breaks down? It gets toad away.
“A statistics professor plans to travel to a conference by plane. When he passes the security check, they discover a bomb in his carry-on! Of course, he is hauled off immediately for interrogation. "I don't understand!" the interrogating officer exclaims. "You're an accomplished
Lisa giving us a house tour.
“Three logicians walk into a bar. The bartender asks, "Will you all have a drink?" The first logician says "I don't know." The second logician says "I don't know." The third logician says "Yes.””
What did the fish say when he swam into a wall? Dam
What do you call it when you've been pulling too many all-nighters studying probability, and you see the ghost of Carl Friedrich Gauss? ... ... ... ... para-NORMAL activity(I'll take my extra credit now please)
What is similar about ice cream and bees? If you eat too many of either you’ll get a stomachache
Why does the Norwegian Navy have bar codes on the side of their ships? So when they come back to port they can...
what has two butts and kills people? an
(from mid-quarter feedback)
Beyond the basics
39
LIVE
Lisa Yan, CS109, 2019
Shared weights?
40
It turns out if you want to force some of your weights to be shared over different neurons, the math isn’t much harder.Convolution is an example of such weight-sharing and is used a lot for vision (Convolutional Neural Networks, CNN).
Lisa Yan, CS109, 2019
Neural networks with multiple layers
41
' F G H I J K LM 99
Lisa Yan, CS109, 2019
Neurons learn features of the dataset
42Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Neurons in later layers will respond strongly to high-level features of your training data.If your training data is faces, you will get lots of face neurons.
If your training datais all of YouTube…
Top stimuli in test set Optimal stimulus foundby numerical optimization
…you get a cat neuron.
Lisa Yan, CS109, 2019 43
Lisa Yan, CS109, 2019
Multiple outputs?
Softmax is a generalization of the sigmoid function.
44
N ∈ ℝ:" # = 1|! = ' = C N
R ∈ ℝ9:" # = S|! = ' = softmax R -
softmax 4 : 5-dimensional values in range[0,1] that add up to 1
sigmoid 4 : value in range [0, 1]
(equivalent: Bernoulli 6)
(equivalent: Multinomial 6!, … , 6")
Lisa Yan, CS109, 2019
Softmax test metric: Top-5 error
45
# = 1 " # = 1|! = '5 0.148 0.137 0.122 0.109 0.104 0.091 0.090 0.096 0.083 0.05
Top-5 classification errorWhat % of datapointsdid not have the correct class label in the top-5predictions?
(class label: 5)
(probabilities of predictions)
Lisa Yan, CS109, 2019
ImageNet classification
46
22,000 categories
14,000,000 images
Hand-engineered features (SIFT, HOG, LBP), Spatial pyramid, SparseCoding/Compression
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
…smoothhound, smoothhound shark, Mustelus mustelusAmerican smooth dogfish, Mustelus canisFlorida smoothhound, Mustelus norrisiwhitetip shark, reef whitetip shark, Triaenodon obseusAtlantic spiny dogfish, Squalus acanthiasPacific spiny dogfish, Squalus suckleyihammerhead, hammerhead sharksmooth hammerhead, Sphyrna zygaenasmalleye hammerhead, Sphyrna tudesshovelhead, bonnethead, bonnet shark, Sphyrna tiburoangel shark, angelfish, Squatina squatina, monkfishelectric ray, crampfish, numbfish, torpedosmalltooth sawfish, Pristis pectinatusguitarfishroughtail stingray, Dasyatis centrourabutterfly rayeagle rayspotted eagle ray, spotted ray, Aetobatus narinaricownose ray, cow-nosed ray, Rhinoptera bonasusmanta, manta ray, devilfishAtlantic manta, Manta birostrisdevil ray, Mobula hypostomagrey skate, gray skate, Raja batislittle skate, Raja erinacea…
Stingray
Mantaray
Lisa Yan, CS109, 2019
ImageNet classification challenge
47
22,000 categories
14,000,000 images
Hand-engineered features (SIFT, HOG, LBP), Spatial pyramid, SparseCoding/Compression
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
…smoothhound, smoothhound shark, Mustelus mustelusAmerican smooth dogfish, Mustelus canisFlorida smoothhound, Mustelus norrisiwhitetip shark, reef whitetip shark, Triaenodon obseusAtlantic spiny dogfish, Squalus acanthiasPacific spiny dogfish, Squalus suckleyihammerhead, hammerhead sharksmooth hammerhead, Sphyrna zygaenasmalleye hammerhead, Sphyrna tudesshovelhead, bonnethead, bonnet shark, Sphyrna tiburoangel shark, angelfish, Squatina squatina, monkfishelectric ray, crampfish, numbfish, torpedosmalltooth sawfish, Pristis pectinatusguitarfishroughtail stingray, Dasyatis centrourabutterfly rayeagle rayspotted eagle ray, spotted ray, Aetobatus narinaricownose ray, cow-nosed ray, Rhinoptera bonasusmanta, manta ray, devilfishAtlantic manta, Manta birostrisdevil ray, Mobula hypostomagrey skate, gray skate, Raja batislittle skate, Raja erinacea…
1000 categories
200,000 images in test set1,200,000 images in train set
Lisa Yan, CS109, 2019
ImageNet challenge: Top-5 classification error
48
99.5%Random guess
" true class label not in 5 guesses =
9995
10005
=995
1000
(lower is better)
Lisa Yan, CS109, 2019
ImageNet challenge: Top-5 classification error
49
99.5%Random guess
25.8%
?
Pre-Neural Networks
GoogLe Net(2015)
16.4%
5.1%Humans(2014)
(lower is better)
Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge. IJCV 2015Szegedy et al., Going Deeper With Convolutions. CVPR 2015Hu et al., Squeeze-and-Excitation Networks. Preprint arXiV 2017
Lisa Yan, CS109, 2019
ImageNet challenge: Top-5 classification error
50
99.5%Random guess
25.8%
?
Pre-Neural Networks
GoogLe Net(2015)
16.4%
5.1%Humans(2014)
SENet(2017)
2.25%
Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge. IJCV 2015Szegedy et al., Going Deeper With Convolutions. CVPR 2015Hu et al., Squeeze-and-Excitation Networks. Preprint arXiV 2017
(lower is better)
Lisa Yan, CS109, 2019
GoogLeNet (2015)
51
1 Trillion Artificial Neurons(btw human brains have 1 billion neurons)
22 layers deep!
Multiple,Multi class output
Szegedy et al., Going Deeper With Convolutions. CVPR 2015
Lisa Yan, CS109, 2019
Good ML = Generalization
Goal of machine learning: build models that generalize wellto predicting new dataOverfitting: fitting the training data too well,so we lose generality of model
• Polynomial on the right fits training data perfectly!• Which would you rather use to predict a new data point?
52
Lisa Yan, CS109, 2019
Prevent overfitting?
53
Dropout (training technique)When your model is training, randomly turn off your neurons with probability 0.5. It will make your network more robust.
Lisa Yan, CS109, 2019
Making decisions?
54
Not everything is classification.
Deep Reinforcement LearningInstead of having the output of a model be a probability, you make output an expectation.
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
Lisa Yan, CS109, 2019
Deep Reinforcement Learning
55
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
Deep Mind Atari GamesScore compared to best human
What’s missing?
56
Where is your data coming from?
57
Lisa Yan, CS109, 2019
Ethics and datasets
Sometimes machine learning feels universally unbiased.We can even prove our estimators are “unbiased” (mathematically).Google/Nikon/HP had biased datasets.
58
Lisa Yan, CS109, 2019
Should your data be unbiased?
59
Man is to Computer Programmer as Woman is to Homemaker?
Debiasing Word Embeddings
Tolga Bolukbasi1, Kai-Wei Chang2, James Zou2, Venkatesh Saligrama1,2, Adam Kalai21Boston University, 8 Saint Mary’s Street, Boston, MA
2Microsoft Research New England, 1 Memorial Drive, Cambridge, [email protected], [email protected], [email protected], [email protected], [email protected]
AbstractThe blind application of machine learning runs the risk of amplifying biases present in data. Such a
danger is facing us with word embedding, a popular framework to represent text data as vectors whichhas been used in many machine learning and natural language processing tasks. We show that evenword embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbingextent. This raises concerns because their widespread use, as we describe, often tends to amplify thesebiases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding.Second, gender neutral words are shown to be linearly separable from gender definition words in the wordembedding. Using these properties, we provide a methodology for modifying an embedding to removegender stereotypes, such as the association between between the words receptionist and female, whilemaintaining desired associations such as between the words queen and female. We define metrics toquantify both direct and indirect gender biases in embeddings, and develop algorithms to “debias” theembedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstratethat our algorithms significantly reduce gender bias in embeddings while preserving the its useful propertiessuch as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings canbe used in applications without amplifying gender bias.
1 Introduction
There have been hundreds or thousands of papers written about word embeddings and their applications,from Web search [27] to parsing Curriculum Vitae [16]. However, none of these papers have recognized howblatantly sexist the embeddings are and hence risk introducing biases of various types into real-world systems.
A word embedding that represent each word (or common phrase) w as a d-dimensional word vector~w 2 Rd. Word embeddings, trained only on word co-occurrence in text corpora, serve as a dictionary of sortsfor computer programs that would like to use word meaning. First, words with similar semantic meaningstend to have vectors that are close together. Second, the vector differences between words in embeddingshave been shown to represent relationships between words [32, 26]. For example given an analogy puzzle,“man is to king as woman is to x” (denoted as man:king :: woman:x), simple arithmetic of the embeddingvectors finds that x=queen is the best answer because:
��!man �����!woman ⇡��!king ����!queen
Similarly, x=Japan is returned for Paris:France :: Tokyo:x. It is surprising that a simple vector arithmeticcan simultaneously capture a variety of relationships. It has also excited practitioners because such a toolcould be useful across applications involving natural language. Indeed, they are being studied and usedin a variety of downstream applications (e.g., document ranking [27], sentiment analysis [18], and questionretrieval [22]).
However, the embeddings also pinpoint sexism implicit in text. For instance, it is also the case that:
��!man �����!woman ⇡ ����������������!computer programmer ���������!homemaker.
1
arX
iv:1
607.
0652
0v1
[cs.C
L] 2
1 Ju
l 201
6
Man is to Computer Programmer as Woman is to Homemaker?
Debiasing Word Embeddings
Tolga Bolukbasi1, Kai-Wei Chang2, James Zou2, Venkatesh Saligrama1,2, Adam Kalai21Boston University, 8 Saint Mary’s Street, Boston, MA
2Microsoft Research New England, 1 Memorial Drive, Cambridge, [email protected], [email protected], [email protected], [email protected], [email protected]
AbstractThe blind application of machine learning runs the risk of amplifying biases present in data. Such a
danger is facing us with word embedding, a popular framework to represent text data as vectors whichhas been used in many machine learning and natural language processing tasks. We show that evenword embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbingextent. This raises concerns because their widespread use, as we describe, often tends to amplify thesebiases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding.Second, gender neutral words are shown to be linearly separable from gender definition words in the wordembedding. Using these properties, we provide a methodology for modifying an embedding to removegender stereotypes, such as the association between between the words receptionist and female, whilemaintaining desired associations such as between the words queen and female. We define metrics toquantify both direct and indirect gender biases in embeddings, and develop algorithms to “debias” theembedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstratethat our algorithms significantly reduce gender bias in embeddings while preserving the its useful propertiessuch as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings canbe used in applications without amplifying gender bias.
1 Introduction
There have been hundreds or thousands of papers written about word embeddings and their applications,from Web search [27] to parsing Curriculum Vitae [16]. However, none of these papers have recognized howblatantly sexist the embeddings are and hence risk introducing biases of various types into real-world systems.
A word embedding that represent each word (or common phrase) w as a d-dimensional word vector~w 2 Rd. Word embeddings, trained only on word co-occurrence in text corpora, serve as a dictionary of sortsfor computer programs that would like to use word meaning. First, words with similar semantic meaningstend to have vectors that are close together. Second, the vector differences between words in embeddingshave been shown to represent relationships between words [32, 26]. For example given an analogy puzzle,“man is to king as woman is to x” (denoted as man:king :: woman:x), simple arithmetic of the embeddingvectors finds that x=queen is the best answer because:
��!man �����!woman ⇡��!king ����!queen
Similarly, x=Japan is returned for Paris:France :: Tokyo:x. It is surprising that a simple vector arithmeticcan simultaneously capture a variety of relationships. It has also excited practitioners because such a toolcould be useful across applications involving natural language. Indeed, they are being studied and usedin a variety of downstream applications (e.g., document ranking [27], sentiment analysis [18], and questionretrieval [22]).
However, the embeddings also pinpoint sexism implicit in text. For instance, it is also the case that:
��!man �����!woman ⇡ ����������������!computer programmer ���������!homemaker.
1
arX
iv:1
607.
0652
0v1
[cs.C
L] 2
1 Ju
l 201
6
Dataset: Google News
Bolukbasi et al., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. NIPS 2016
Should our unbiased data collection reflect society’s systemic bias?
Lisa Yan, CS109, 2019
How can we explain decisions?
60
If your task is image classification,reasoning about high-level features is relatively easy.Everything can be visualized.
What if you are trying to classify social outcomes?
• Criminal recidivism• Job performance• Policing • Terrorist risk• At-risk kids
Ethics in Machine Learning is a whole new field. !
61
CS109 Wrap-Up
62
LIVE
63
What have we learned in CS109?
Lisa Yan, CS109, 2020
A wild journey
64
Computer science Probability
Lisa Yan, CS109, 2020
From combinatorics to probability…
65
) 5 + ) 59 = 1
Lisa Yan, CS109, 2020
…to random variables and the Central Limit Theorem…
66
Bernoulli
Poisson
Gaussian
Lisa Yan, CS109, 2020
…to statistics, parameter estimation, and machine learning
67
A happyBhutanese person
77 & Flu Under-grad
TiredFever
and Learn
Lisa Yan, CS109, 2020
Lots and lots of analysis
68
Climate sensitivity
Bloom filters
Peer grading
Biometric keystroke recognition
Web MD inferencep-hacking?
Coursera A/B testing
Lisa Yan, CS109, 2020
Lots and lots of analysis
69
Heart
Ancestry Netflix
Lisa Yan, CS109, 2020
After CS109
70
Statistics
Stats 200 – Statistical InferenceStats 208 – Intro to the BootstrapStats 209 – Group Methods/Causal Inference
Theory
CS161 – Algorithmic analysisStats 217 – Stochastic ProcessesCS238 – Decision Making Under UncertaintyCS228 – Probabilistic Graphical Models
Lisa Yan, CS109, 2020
After CS109
71
AI
CS 221 – Intro to AICS 229 – Machine LearningCS 230 – Deep LearningCS 224N – Natural Language ProcessingCS 231N – Conv Neural Nets for Visual RecognitionCS 234 – Reinforcement Learning
Applications
CS 279 – Bio ComputationLiterally any class with numbers in it
72
What have we done together this quarter?
Lisa Yan, CS109, 2020
The CS109 teaching team
73
Lisa Yan, CS109, 2020 74
Section
Python session Overleaf tutorial
Discussion, synchronous and asynchronous
Lisa Yan, CS109, 2020
Tea hours
75
Turns out jigsaw puzzles aren’t probabilistic!
76
What have you learned from CS109?
77
What do you want toremember in 5 years?
Lisa Yan, CS109, 2020
What do you want to remember in the next 5 years?
78
Lisa Yan, CS109, 2020
What do you want to remember in the next 5 years?
79
Lisa Yan, CS109, 2020
What do you want to remember in the next 5 years?
80
81
Why study probability + CS?
Lisa Yan, CS109, 2020
Why study probability + CS?
82
Source: US Bureau of Labor Statistics
Lisa Yan, CS109, 2020
Why study probability + CS?
83
Interdisciplinary Closest thing to magic
Lisa Yan, CS109, 2020
Why study probability + CS?
84
Everyone is welcome!
85
Technology magnifies.What do we want
magnified?
86
You are all one step closer to improving the world.
(all of you!)
87
The end
See you soon… J