Bayesian and Connectionist Approaches to Learning Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg

Bayesian and Bayesian and Connectionist Connectionist

Approaches to LearningApproaches to LearningTom Griffiths, Jay McClellandTom Griffiths, Jay McClelland

Alison Gopnik, Mark Alison Gopnik, Mark SeidenbergSeidenberg

Who Are We and What Do Who Are We and What Do We Study?We Study?

We are We are Cognitive and developmental psychologists Cognitive and developmental psychologists who use mathematical and computational who use mathematical and computational models together with experimental studies of models together with experimental studies of children and adultschildren and adults

We studyWe studyHuman cognitive processes ranging from Human cognitive processes ranging from object recognition, language processing, and object recognition, language processing, and reading to semantic cognition, naïve physics reading to semantic cognition, naïve physics and causal reasoningand causal reasoning

Our QuestionOur QuestionHow do How do

probabilistic/Bayesian and probabilistic/Bayesian and connectionist/neural network connectionist/neural network

models relate?models relate?

Brains all round…Brains all round…

ScheduleSchedule Tom GriffithsTom Griffiths

Probabilistic/Bayesian ApproachesProbabilistic/Bayesian Approaches Jay McClellandJay McClelland

Connectionist/Neural Network ApproachesConnectionist/Neural Network Approaches Alison GopnikAlison Gopnik

Causal ReasoningCausal Reasoning Mark SeidenbergMark Seidenberg

Language AcquisionLanguage Acquision Open DiscussionOpen Discussion

Robotics, Machine Learning, Other Robotics, Machine Learning, Other Applications…Applications…

Emergent Emergent Functions of Functions of

Simple SystemsSimple Systems

J. L. McClellandJ. L. McClellandStanford UniversityStanford University

TopicsTopics

Emergent probabilistic optimization Emergent probabilistic optimization in neural networksin neural networks

Relationship between Relationship between competence/rational approaches and competence/rational approaches and mechanistic (including mechanistic (including connectionist) approachesconnectionist) approaches

Some models that bring Some models that bring connectionist and probabilistic connectionist and probabilistic approaches into proximal contactapproaches into proximal contact

Connectionist Units Calculate Connectionist Units Calculate Posteriors based on Priors and Posteriors based on Priors and

EvidenceEvidence GivenGiven

A unit representing hypothesis A unit representing hypothesis hhii, with binary , with binary inputs inputs j j representing the state of various representing the state of various elements of evidence elements of evidence ee, where for all , where for all j pj p((eejj) is ) is assumed conditionally independent given assumed conditionally independent given hhii

A bias on the unit equal to A bias on the unit equal to loglog((priorpriorii//(1(1-prior-priorii)))) Weights to the unit from each input equal to Weights to the unit from each input equal to

loglog((p(ep(ejj|h|hii)/(log(p(e)/(log(p(ejj|not h|not hii)))) If If

the output of the unit is computed, taking the the output of the unit is computed, taking the logistic function of the net inputlogistic function of the net input

netnetii = bias = biasii + + jj a ajj w wijij

aaii = 1/[1+ = 1/[1+expexp( -( -netnetii)])]

Then Then aaii = = p(hp(hii||ee))

A set units for mutually exclusive A set units for mutually exclusive alternatives can assign the posterior alternatives can assign the posterior probability to each in a similar way, using probability to each in a similar way, using the softmax activation function the softmax activation function

aaii = exp( = exp(netnetii)/)/i’i’ exp( exp(netneti’i’)) If If = 1, this constitutes probability = 1, this constitutes probability

matching.matching. As As increases, more and more of the increases, more and more of the

activation goes to the most likely activation goes to the most likely alternative(s).alternative(s).

Unit i

Input fromunit j

wij

Emergent Outcomes from Local Emergent Outcomes from Local Computations Computations

(Hopfield, ’82, Hinton & Sejnowski, (Hopfield, ’82, Hinton & Sejnowski, ’83)’83)

If If wwijij = w = wjiji and if units are updated and if units are updated asynchronously, setting asynchronously, setting

aaii = 1 if = 1 if netnetii >0, >0, aaii = 0 otherwise = 0 otherwiseA network will settle to a state A network will settle to a state ss which is a which is a local maximum in a measure Rumelhart et local maximum in a measure Rumelhart et al (1986) called al (1986) called GG G(s) =G(s) = i<ji<j wwijij a aiiaajj + + ii aaii((biasbiasii + + extextii))

If each unit sets its activation to 1 with If each unit sets its activation to 1 with probability logistic(probability logistic(netnetii) then) then

pp((ss) = exp() = exp(GG((ss))/))/s’s’(exp((exp(GG((s’s’))))

A Tweaked Connectionist Model A Tweaked Connectionist Model (McClelland & Rumelhart, 1981) that is (McClelland & Rumelhart, 1981) that is

Also a Graphical ModelAlso a Graphical Model Each pool of units in the IA model is Each pool of units in the IA model is

equivalent to a Dirichlet variable (c.f. equivalent to a Dirichlet variable (c.f. Dean, 2005).Dean, 2005).

This is enforced by using softmax to set This is enforced by using softmax to set one of the aone of the aii in each pool to 1 with in each pool to 1 with probability: probability:

ppjj = e = enetnetjj//j’j’eenetnet

j’j’

Weight arrays linking the variables are Weight arrays linking the variables are equivalent of the ‘edges’ encoding equivalent of the ‘edges’ encoding conditional relationships between states conditional relationships between states of these different variables.of these different variables.

Biases at word level encode prior Biases at word level encode prior p(w).p(w).

Weights are bi-directional, but encode Weights are bi-directional, but encode generative constraints (generative constraints (p(l|w), p(f|l)p(l|w), p(f|l)).).

At equilibrium with At equilibrium with = 1, network’s = 1, network’s probability of being in state probability of being in state ss equals equals p(s|p(s|I).I).

But that’s not the true PDP But that’s not the true PDP approach to approach to

Perception/Cognition/etc…Perception/Cognition/etc… We want to learn how to represent We want to learn how to represent

the world and constraints among its the world and constraints among its constituents from experience, using constituents from experience, using (to the fullest extent possible) a (to the fullest extent possible) a domain-general approach.domain-general approach.

In this context, the prototypical In this context, the prototypical connectionist learning rules connectionist learning rules correspond to probability correspond to probability maximization or matchingmaximization or matching

Back Propagation Algorithm:Back Propagation Algorithm:wwijij = = iiaajj

Maximizes Maximizes pp((ooii||II) for each output ) for each output unit.unit.

Boltmann Machine Learning Boltmann Machine Learning Algorithm:Algorithm:

wwijij = = ( (aaii++aajj

++ - - aaii--aajj

--)) Learns to match probabilities of Learns to match probabilities of

entire output statesentire output states oo given current given current Input. That is, it minimizesInput. That is, it minimizes

∫∫pp((oo||II) log() log(pp((oo||II)/)/qq((oo||II)) )) ddooI o

Recent DevelopmentsRecent Developments

Hinton’s deep belief Hinton’s deep belief networks are fully networks are fully distributed learned distributed learned connectionist models that connectionist models that use a restricted form of the use a restricted form of the Boltzmann machine (no Boltzmann machine (no intra-layer connections). intra-layer connections). They are fast and beat They are fast and beat other machine learning other machine learning methods.methods.

Adding generic constraints Adding generic constraints (sparsity, locality) allow (sparsity, locality) allow such networks to learn such networks to learn efficiently and generalize efficiently and generalize very well in demanding very well in demanding task contexts.task contexts. Hinton, Osindero, and Teh

(2006). A fast learning algorithm for deep belief networks. Neural Computation, 18, 1527-54.

TopicsTopics




Two perspectivesTwo perspectives People are rational, their People are rational, their

behavior is optimal.behavior is optimal.

They seek explicit internal They seek explicit internal models of the structure of models of the structure of the world, within which to the world, within which to reason.reason. Optimal structure type Optimal structure type

for each domainfor each domain Optimal structure Optimal structure

instance within typeinstance within type

People evolved through an People evolved through an optimization process, and are optimization process, and are likely to approximate likely to approximate optimality/rationality within optimality/rationality within limits.limits.

Fundamental aspects of Fundamental aspects of natural/intuitive cognition natural/intuitive cognition may depend largely on may depend largely on implicit knowledge. implicit knowledge.

Natural structure (e.g. Natural structure (e.g. language) does not exactly language) does not exactly correspond to any specific correspond to any specific structure type.structure type.

Culture/School encourages Culture/School encourages us to think and reason us to think and reason explicitly, and gives us tools explicitly, and gives us tools for this; we do so under some for this; we do so under some circumstances. circumstances.

Many connectionist models Many connectionist models do not directly address this do not directly address this kind of thinking; eventually kind of thinking; eventually they should be elaborated to they should be elaborated to do so.do so.

Two Perspectives, Cont’dTwo Perspectives, Cont’d Resource limits and Resource limits and

implementation constraints implementation constraints are unknown, and should be are unknown, and should be ignored in determining what ignored in determining what is rational/optimal.is rational/optimal.

Inference is still hard, and Inference is still hard, and prior domain-specific prior domain-specific constraints are therefore constraints are therefore essential.essential.

Human behavior won’t be Human behavior won’t be understood without considering understood without considering the constraints it operates under.the constraints it operates under.

Determining what is optimal Determining what is optimal sanssans constraints is always useful, even constraints is always useful, even soso

Such an effort should not Such an effort should not presuppose individual humans presuppose individual humans intend to derive an explicit model.intend to derive an explicit model.

Inference is hard, and domain Inference is hard, and domain specific priors can help, but specific priors can help, but domain-general mechanisms domain-general mechanisms subject to generic constraints subject to generic constraints deserve full exploration.deserve full exploration.

In some cases such models may In some cases such models may closely approximate what might closely approximate what might be the optimal explicit model.be the optimal explicit model.

But that model might only be an But that model might only be an approximation and the domain-approximation and the domain-specific constraints might not be specific constraints might not be necessary.necessary.

Perspectives on Perspectives on DevelopmentDevelopment

A competence-level A competence-level approach can ask, approach can ask, what is the best what is the best representation a representation a child could have child could have given the data given the data gathered to date?gathered to date?

The entire data The entire data sample is retained, sample is retained, and the optimal and the optimal model is re-estimatedmodel is re-estimated

The developing The developing child is an on-line child is an on-line learning system; learning system; the parameters of the parameters of the mind are the mind are adjusted as each adjusted as each new experience new experience comes in, and the comes in, and the experiences experiences themselves are themselves are rapidly lost.rapidly lost.

Is a Convergence Is a Convergence Possible?Possible?

Yes!Yes! It is possible to ask what is optimal/rational within It is possible to ask what is optimal/rational within

any set of constraints.any set of constraints. TimeTime ArchitectureArchitecture AlgorithmAlgorithm Reliability and dynamics of the hardwareReliability and dynamics of the hardware

It is then possible to ask how close some mechanism It is then possible to ask how close some mechanism actually comes to achieving optimality, within the actually comes to achieving optimality, within the specified constraints.specified constraints.

It is also possible to ask how close it comes to It is also possible to ask how close it comes to explaining actual human performance, including explaining actual human performance, including performance in learning and response to experience performance in learning and response to experience during development.during development.

TopicsTopics




Models that Bring Connectionist Models that Bring Connectionist and Probabilistic Approaches into and Probabilistic Approaches into

Proximal ContactProximal Contact Graphical IA model of Context Effects in Graphical IA model of Context Effects in

PerceptionPerception In progress; see Movellan & McClelland, 2001.In progress; see Movellan & McClelland, 2001.

Leaky Competing Accumulator Model of Decision Leaky Competing Accumulator Model of Decision DynamicsDynamics Usher and McClelland, 2001, and the large family of Usher and McClelland, 2001, and the large family of

related decision making modelsrelated decision making models Models of Unsupervised Category LearningModels of Unsupervised Category Learning

Competitive Learning, OME, TOME (Lake Competitive Learning, OME, TOME (Lake et alet al, , ICDL08).ICDL08).

Subjective Likelihood Model of Recognition Subjective Likelihood Model of Recognition Memory Memory McClelland and Chappell, 1998; c.f. REM, Steyvers McClelland and Chappell, 1998; c.f. REM, Steyvers

and Shiffrin, 1997), and a forthcoming variant using and Shiffrin, 1997), and a forthcoming variant using distributed item representations.distributed item representations.

Documents

Bayesian and Connectionist Approaches to Learning Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg