144
Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford University [email protected] September 8, 2015 J. Steinhardt (Stanford) Learning and Inference September 8, 2015 1 / 31

Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Learning with Intractable Inference and Partial Supervision

Jacob Steinhardt

Stanford University

[email protected]

September 8, 2015

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 1 / 31

Page 2: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 3: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 4: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.

Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 5: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 6: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 7: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 8: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Motivation

An Example

Company officials refused to comment.公司官员拒绝对此发表评论。

He said the company would appeal.他表示该公司将提出上诉。

Statistical reasoning: aggregate data across sentences to reach conclusions.Computational reasoning: focus on easily disambiguated words first.

Tension: statistics wants to expose information (aggregation), while computerscience wants to hide it (abstraction, adaptivity).

Statistical inference is computationally intractable.

How can we bring these two paradigms together?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 2 / 31

Page 9: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

1 Motivation

2 Formal Setting

3 Reified Context Models

4 Relaxed Supervision

5 Open Questions

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 3 / 31

Page 10: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Setting: Structured Prediction

input x :

output y : v o l c a n i c

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Structured output space Y — requires inference

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 4 / 31

Page 11: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Setting: Structured Prediction

input x :

output y : v o l c a n i c

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]

Structured output space Y — requires inference

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 4 / 31

Page 12: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Setting: Structured Prediction

input x :

output y : v o l c a n i c

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Structured output space Y — requires inference

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 4 / 31

Page 13: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 14: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 15: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 16: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 17: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 18: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 19: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 20: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Supervised Learning is Easy

Recall: want to maximize E[logpθ (y | x)].

Suppose pθ (y | x) ∝ exp(θ>φ(x ,y)). Then:

∇θ logpθ (y | x) = φ(x ,y)︸ ︷︷ ︸given

−Ey∼pθ (·|x)[φ(x , y)]︸ ︷︷ ︸inference

.

Inference errors will be corrected by supervision signal (φ(x ,y)) over thecourse of learning.

In practice, anything reasonable (MCMC, beam search) works.

Conceptually, can use Searn (Daume III et al., 2009) or pseudolikelihood(Besag, 1975) to obviate need for inference.

Approximate inference is easy in supervised settings.

Unless we care about estimating uncertainty (calibration, precision/recall)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 5 / 31

Page 21: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 22: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]

Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 23: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 24: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 25: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 26: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Partially Supervised Structured Prediction

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Goal: learn θ to maximize Ex ,y∼D[logpθ (y | x)]Where pθ (y | x) = ∑z pθ (y ,z | x)

Again assume pθ (y ,z | x) ∝ exp(θ>φ(x ,z,y)). Then

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸inference on z

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸inference on z,y

.

Inference errors on z get reinforced during learning.

Inference often hardest (and most consequential) at beginning of learning!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 6 / 31

Page 27: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

This Work

Two thrusts:1 How can we reify computation as part of a statistical model?2 How can we relax the supervision signal to aid computation while still

maintaining consistent parameter estimates?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 7 / 31

Page 28: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Related Work

Learning tractable models / accounting for approximations

sum-product networks (Poon & Domingos, 2011)

max-violation perceptron (Huang, Fayong, & Guo, 2012; Zhang et al., 2013; Yu et al., 2013)

fast-mixing Markov chains (S. & Liang, 2015)

many others (Barbu, 2009; Daume III, Langford, & Marcu, 2009; Domke, 2011; Stoyanov,

Ropson, & Eisner, 2011; Niepert & Domingos, 2014; Li & Zemel, 2014; Shi, S., & Liang, 2015)

Improving expressivity of variational inference

combining with MCMC (Salimans, Kingma, & Welling, 2015)

using neural networks (Kingma & Welling, 2013; Mnih & Gregor, 2014)

Computational-statistical tradeoffs

huge body of recent work (Berthet & Rigollet, 2013; Chandrasekaran & Jordan, 2013;

Zhang et al., 2013; Zhang, Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &

Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014; Braverman et al., 2015; S. &

Duchi, 2015; S., Valiant, & Wager, 2015)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 8 / 31

Page 29: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Related Work

Learning tractable models / accounting for approximations

sum-product networks (Poon & Domingos, 2011)

max-violation perceptron (Huang, Fayong, & Guo, 2012; Zhang et al., 2013; Yu et al., 2013)

fast-mixing Markov chains (S. & Liang, 2015)

many others (Barbu, 2009; Daume III, Langford, & Marcu, 2009; Domke, 2011; Stoyanov,

Ropson, & Eisner, 2011; Niepert & Domingos, 2014; Li & Zemel, 2014; Shi, S., & Liang, 2015)

Improving expressivity of variational inference

combining with MCMC (Salimans, Kingma, & Welling, 2015)

using neural networks (Kingma & Welling, 2013; Mnih & Gregor, 2014)

Computational-statistical tradeoffs

huge body of recent work (Berthet & Rigollet, 2013; Chandrasekaran & Jordan, 2013;

Zhang et al., 2013; Zhang, Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &

Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014; Braverman et al., 2015; S. &

Duchi, 2015; S., Valiant, & Wager, 2015)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 8 / 31

Page 30: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Formal Setting

Related Work

Learning tractable models / accounting for approximations

sum-product networks (Poon & Domingos, 2011)

max-violation perceptron (Huang, Fayong, & Guo, 2012; Zhang et al., 2013; Yu et al., 2013)

fast-mixing Markov chains (S. & Liang, 2015)

many others (Barbu, 2009; Daume III, Langford, & Marcu, 2009; Domke, 2011; Stoyanov,

Ropson, & Eisner, 2011; Niepert & Domingos, 2014; Li & Zemel, 2014; Shi, S., & Liang, 2015)

Improving expressivity of variational inference

combining with MCMC (Salimans, Kingma, & Welling, 2015)

using neural networks (Kingma & Welling, 2013; Mnih & Gregor, 2014)

Computational-statistical tradeoffs

huge body of recent work (Berthet & Rigollet, 2013; Chandrasekaran & Jordan, 2013;

Zhang et al., 2013; Zhang, Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &

Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014; Braverman et al., 2015; S. &

Duchi, 2015; S., Valiant, & Wager, 2015)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 8 / 31

Page 31: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

1 Motivation

2 Formal Setting

3 Reified Context Models

4 Relaxed Supervision

5 Open Questions

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 9 / 31

Page 32: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Structured Prediction Task

input x :

output y : v o l c a n i c

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 10 / 31

Page 33: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 11 / 31

Page 34: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 11 / 31

Page 35: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 11 / 31

Page 36: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 11 / 31

Page 37: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 38: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 39: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 40: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 41: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 42: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 43: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 12 / 31

Page 44: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 45: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i c

context c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 46: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 47: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 48: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?

=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 49: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 13 / 31

Page 50: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 51: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 52: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 53: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 54: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 55: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 56: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 14 / 31

Page 57: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 58: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 59: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 60: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 61: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 62: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 63: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 64: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 65: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 66: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 15 / 31

Page 67: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 16 / 31

Page 68: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 16 / 31

Page 69: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).

comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 16 / 31

Page 70: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 16 / 31

Page 71: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Precision

Measure precision (# of correct words) vs. recall (# of words predicted).

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00pr

ecis

ion

Word Recognition

Beam searchRCM

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 16 / 31

Page 72: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .

latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 73: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I am

output y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 74: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 75: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 76: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.

use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 77: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.

again compare to beam search (Nuhn et al., 2013)Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 78: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 79: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Partially Supervised Learning

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8m

appi

ng a

ccur

acy

DeciphermentRCMbeam

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 17 / 31

Page 80: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

h

Decipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 18 / 31

Page 81: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 18 / 31

Page 82: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.

End of training: lots of information, long contexts.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 18 / 31

Page 83: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 18 / 31

Page 84: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 19 / 31

Page 85: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 19 / 31

Page 86: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 19 / 31

Page 87: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Reified Context Models

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 19 / 31

Page 88: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

1 Motivation

2 Formal Setting

3 Reified Context Models

4 Relaxed Supervision

5 Open Questions

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 20 / 31

Page 89: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Intractable Supervision

Sometimes, even supervision is intractable:

input x : What is the largest city in California?latent z: argmax(λx .CITY(x)∧ LOC(x ,CA),λx .POPULATION(x))output y : Los Angeles

Intractable no matter how simple the model is!

but likely statistical relationships (e.g. between CITY and Los Angeles)

Need a way to relax the likelihood.

while maintaining good statistical properties (asymptotic consistency)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 21 / 31

Page 90: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Intractable Supervision

Sometimes, even supervision is intractable:

input x : What is the largest city in California?latent z: argmax(λx .CITY(x)∧ LOC(x ,CA),λx .POPULATION(x))output y : Los Angeles

Intractable no matter how simple the model is!

but likely statistical relationships (e.g. between CITY and Los Angeles)

Need a way to relax the likelihood.

while maintaining good statistical properties (asymptotic consistency)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 21 / 31

Page 91: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Intractable Supervision

Sometimes, even supervision is intractable:

input x : What is the largest city in California?latent z: argmax(λx .CITY(x)∧ LOC(x ,CA),λx .POPULATION(x))output y : Los Angeles

Intractable no matter how simple the model is!

but likely statistical relationships (e.g. between CITY and Los Angeles)

Need a way to relax the likelihood.

while maintaining good statistical properties (asymptotic consistency)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 21 / 31

Page 92: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Intractable Supervision

Sometimes, even supervision is intractable:

input x : What is the largest city in California?latent z: argmax(λx .CITY(x)∧ LOC(x ,CA),λx .POPULATION(x))output y : Los Angeles

Intractable no matter how simple the model is!

but likely statistical relationships (e.g. between CITY and Los Angeles)

Need a way to relax the likelihood.

while maintaining good statistical properties (asymptotic consistency)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 21 / 31

Page 93: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Intractable Supervision

Sometimes, even supervision is intractable:

input x : What is the largest city in California?latent z: argmax(λx .CITY(x)∧ LOC(x ,CA),λx .POPULATION(x))output y : Los Angeles

Intractable no matter how simple the model is!

but likely statistical relationships (e.g. between CITY and Los Angeles)

Need a way to relax the likelihood.

while maintaining good statistical properties (asymptotic consistency)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 21 / 31

Page 94: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Approach

tractable

intractable

θ

β

Start with intractable likelihood q(y | z), model family pθ (z | x).

Replace q(y | z) with family of likelihoods qβ (y | z) (some very easy).

Derive constraints on (θ ,β ) that ensure tractability.

Learn within the tractable region.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 22 / 31

Page 95: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Approach

tractable

intractable

θ

β

Start with intractable likelihood q(y | z), model family pθ (z | x).

Replace q(y | z) with family of likelihoods qβ (y | z) (some very easy).

Derive constraints on (θ ,β ) that ensure tractability.

Learn within the tractable region.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 22 / 31

Page 96: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Approach

tractable

intractable

θ

β

Start with intractable likelihood q(y | z), model family pθ (z | x).

Replace q(y | z) with family of likelihoods qβ (y | z) (some very easy).

Derive constraints on (θ ,β ) that ensure tractability.

Learn within the tractable region.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 22 / 31

Page 97: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Approach

tractable

intractable

θ

β

Start with intractable likelihood q(y | z), model family pθ (z | x).

Replace q(y | z) with family of likelihoods qβ (y | z) (some very easy).

Derive constraints on (θ ,β ) that ensure tractability.

Learn within the tractable region.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 22 / 31

Page 98: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z

pθ (z,y | x)

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 99: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z

pθ (z,y | x)

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 100: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z

pθ (z,y | x)

)

As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 101: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z,y

pθ (z, y | x)exp(−distβ (y ,y))

)

As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 102: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z,y

pθ (z, y | x)exp(−distβ (y ,y))

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 103: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z,y

pθ (z, y | x)exp(−distβ (y ,y))

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 104: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z,y

pθ (z, y | x)exp(−distβ (y ,y))

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 105: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Example

input x : Company officials refused to comment.latent z:output y : 公司官员拒绝对此发表评论。

Idea: instead of requiring y to match observed output, penalize based onsome weighted distance distβ (y ,y).

`(θ ,β ;x ,y) =− log

(∑z,y

pθ (z, y | x)exp(−distβ (y ,y))

)As β → ∞, recover original objective.

but optimizing will send β → 0!

Two questions:

How to create natural pressure to increase β?

How to define distances for general problems?

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 23 / 31

Page 106: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 107: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 108: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 109: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 110: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 111: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 112: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Relaxed Supervision: Formal Framework

Assume (WLOG) that z→ y is deterministic: y = f (z).

Let S(z,y) ∈ {0,1} encode the constraint [f (z) = y ].

Take projections πj : Y →Yj , j = 1, . . . ,k .

Let Sj(z,y) = [πj(f (z)) = πj(y)] be the projected constraint.

Define distance function:

distβ (z,y) =k

∑j=1

βj · (1−Sj(z,y)).

Note: can featurize distβ as −β>ψ(z,y), where ψj = Sj −1.

Lemma

Suppose that π1×·· ·×πk is injective. Then

S(z,y) =k∧

j=1

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 24 / 31

Page 113: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

=⇒V∧

j=1

[count(z, j) = count(y , j)]

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 114: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

=⇒V∧

j=1

[count(z, j) = count(y , j)]

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 115: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

=⇒V∧

j=1

[count(z, j) = count(y , j)]

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 116: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

=⇒V∧

j=1

[count(z, j) = count(y , j)]

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 117: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

=⇒V∧

j=1

[count(z, j) =

πj(y)︷ ︸︸ ︷count(y , j)]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 118: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Unordered Supervision

input x : a b a alatent z: d c d doutput y : {c : 1,d : 3}

Let count(·, j) count number of occurences of character j .

Decomposition:

[y =

f (z)︷ ︸︸ ︷multiset(z)]︸ ︷︷ ︸S(z,y)

⇐⇒V∧

j=1

[count(z, j) =

πj(y)︷ ︸︸ ︷count(y , j)]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 25 / 31

Page 119: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 120: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)

latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 121: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)

output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 122: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 123: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 124: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 125: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Example: Conjunctive Semantic Parsing

Side information: predicates {Q1, . . . ,Qm}.e.g. Q6 = [DOG] = set of all dogs

input x : brown dog (input utterance)latent z: (Q11, Q6) (set of all brown objects, set of all dogs)output y : Q11∩Q6 (denotation, observed as a set)

For z = (Qj1 , . . . ,QjL), define the denotation JzK = Qj1 ∩·· ·∩QjL .

Decomposition:

y = JzK︸ ︷︷ ︸S(z,y)

⇐⇒m∧

j=1

I[JzK⊆ Qj ] =

πj(y)︷ ︸︸ ︷I[y ⊆ Qj ]︸ ︷︷ ︸

Sj(z,y)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 26 / 31

Page 126: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Normalization Constant

Create pressure to increase β by adding normalization constant:

qβ (y | z) = exp(β>ψ(z,y)︸ ︷︷ ︸−distβ (z,y)

−A(β ))

`(θ ,β ;x ,y) =− log

(∑z

pθ (z | x)qβ (y | z)).

Lemma

Given π1, . . . ,πk , let A(β )def= ∑

kj=1 log(1+(|Yj |−1)exp(−βj)). Then,

∑y exp(−distβ (z,y))≤ A(β ) for all z.

Lemma

Jointly minimizing L(θ ,β ) = E[`(θ ,β ;x ,y)] yields a consistent estimate of thetrue parameters θ ∗.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 27 / 31

Page 127: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Normalization Constant

Create pressure to increase β by adding normalization constant:

qβ (y | z) = exp(β>ψ(z,y)︸ ︷︷ ︸−distβ (z,y)

−A(β ))

`(θ ,β ;x ,y) =− log

(∑z

pθ (z | x)qβ (y | z)).

Lemma

Given π1, . . . ,πk , let A(β )def= ∑

kj=1 log(1+(|Yj |−1)exp(−βj)). Then,

∑y exp(−distβ (z,y))≤ A(β ) for all z.

Lemma

Jointly minimizing L(θ ,β ) = E[`(θ ,β ;x ,y)] yields a consistent estimate of thetrue parameters θ ∗.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 27 / 31

Page 128: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Normalization Constant

Create pressure to increase β by adding normalization constant:

qβ (y | z) = exp(β>ψ(z,y)︸ ︷︷ ︸−distβ (z,y)

−A(β ))

`(θ ,β ;x ,y) =− log

(∑z

pθ (z | x)qβ (y | z)).

Lemma

Given π1, . . . ,πk , let A(β )def= ∑

kj=1 log(1+(|Yj |−1)exp(−βj)). Then,

∑y exp(−distβ (z,y))≤ A(β ) for all z.

Lemma

Jointly minimizing L(θ ,β ) = E[`(θ ,β ;x ,y)] yields a consistent estimate of thetrue parameters θ ∗.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 27 / 31

Page 129: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Normalization Constant

Create pressure to increase β by adding normalization constant:

qβ (y | z) = exp(β>ψ(z,y)︸ ︷︷ ︸−distβ (z,y)

−A(β ))

`(θ ,β ;x ,y) =− log

(∑z

pθ (z | x)qβ (y | z)).

Lemma

Given π1, . . . ,πk , let A(β )def= ∑

kj=1 log(1+(|Yj |−1)exp(−βj)). Then,

∑y exp(−distβ (z,y))≤ A(β ) for all z.

Lemma

Jointly minimizing L(θ ,β ) = E[`(θ ,β ;x ,y)] yields a consistent estimate of thetrue parameters θ ∗.

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 27 / 31

Page 130: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Constraints for Efficient Inference

Inference task:

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸sample z given x ,y

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸sample z given x

.

pθ ,β (z | x ,y) ∝ pθ (z | x)qβ (y | z)∝ pθ (z | x)exp(β>ψ(z,y)).

Rejection sampler:

sample from pθ (z | x)accept with probability exp(β>ψ(z,y)).

Bound expected number of samples:

∑x ,y∈Data

(∑z

pθ (z | x)exp(β>ψ(z,y))

)−1

≤ τ. (1)

Ratio of normalization constants: can optimize subject to (1) (similar to CCCP).

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 28 / 31

Page 131: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Constraints for Efficient Inference

Inference task:

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸sample z given x ,y

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸sample z given x

.

pθ ,β (z | x ,y) ∝ pθ (z | x)qβ (y | z)∝ pθ (z | x)exp(β>ψ(z,y)).

Rejection sampler:

sample from pθ (z | x)accept with probability exp(β>ψ(z,y)).

Bound expected number of samples:

∑x ,y∈Data

(∑z

pθ (z | x)exp(β>ψ(z,y))

)−1

≤ τ. (1)

Ratio of normalization constants: can optimize subject to (1) (similar to CCCP).

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 28 / 31

Page 132: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Constraints for Efficient Inference

Inference task:

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸sample z given x ,y

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸sample z given x

.

pθ ,β (z | x ,y) ∝ pθ (z | x)qβ (y | z)∝ pθ (z | x)exp(β>ψ(z,y)).

Rejection sampler:

sample from pθ (z | x)accept with probability exp(β>ψ(z,y)).

Bound expected number of samples:

∑x ,y∈Data

(∑z

pθ (z | x)exp(β>ψ(z,y))

)−1

≤ τ. (1)

Ratio of normalization constants: can optimize subject to (1) (similar to CCCP).

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 28 / 31

Page 133: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Constraints for Efficient Inference

Inference task:

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸sample z given x ,y

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸sample z given x

.

pθ ,β (z | x ,y) ∝ pθ (z | x)qβ (y | z)∝ pθ (z | x)exp(β>ψ(z,y)).

Rejection sampler:

sample from pθ (z | x)accept with probability exp(β>ψ(z,y)).

Bound expected number of samples:

∑x ,y∈Data

(∑z

pθ (z | x)exp(β>ψ(z,y))

)−1

≤ τ. (1)

Ratio of normalization constants: can optimize subject to (1) (similar to CCCP).

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 28 / 31

Page 134: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Constraints for Efficient Inference

Inference task:

∇θ logpθ (y | x) = Ez∼pθ (·|x ,y)[φ(x , z,y)]︸ ︷︷ ︸sample z given x ,y

−Ez,y∼pθ (·|x)[φ(x , z, y)]︸ ︷︷ ︸sample z given x

.

pθ ,β (z | x ,y) ∝ pθ (z | x)qβ (y | z)∝ pθ (z | x)exp(β>ψ(z,y)).

Rejection sampler:

sample from pθ (z | x)accept with probability exp(β>ψ(z,y)).

Bound expected number of samples:

∑x ,y∈Data

(∑z

pθ (z | x)exp(β>ψ(z,y))

)−1

≤ τ. (1)

Ratio of normalization constants: can optimize subject to (1) (similar to CCCP).

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 28 / 31

Page 135: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Experiments

Conjunctive semantic parsing:

0 10 20 30 40 50iteration

0.0

0.2

0.4

0.6

0.8

1.0

accu

racy

FixedBeta(0.5)FixedBeta(0.2)FixedBeta(0.1)

0 10 20 30 40 50iteration

100

101

102

103

104

105

num

ber o

f sam

ples

FixedBeta(0.5)FixedBeta(0.2)FixedBeta(0.1)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 29 / 31

Page 136: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Relaxed Supervision

Experiments

Conjunctive semantic parsing:

0 10 20 30 40 50iteration

0.0

0.2

0.4

0.6

0.8

1.0

accu

racy

AdaptBeta(500)FixedBeta(0.5)FixedBeta(0.2)FixedBeta(0.1)

0 10 20 30 40 50iteration

100

101

102

103

104

105

num

ber o

f sam

ples

AdaptBeta(500)FixedBeta(0.5)FixedBeta(0.2)FixedBeta(0.1)

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 29 / 31

Page 137: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

1 Motivation

2 Formal Setting

3 Reified Context Models

4 Relaxed Supervision

5 Open Questions

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 30 / 31

Page 138: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 139: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 140: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimization

Metacomputationusing Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 141: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 142: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 143: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31

Page 144: Learning with Intractable Inference and Partial Supervision - Stanford … · 2015. 12. 3. · Learning with Intractable Inference and Partial Supervision Jacob Steinhardt Stanford

Open Questions

Scale up to larger taskssemantic parsing, reinforcement learning, program induction

Extend to Bayesian models

Understand non-convex optimizationMetacomputation

using Reified Context Models?

Probabilistic abstract interpretation

Statistics & Computation: still a long way to go

Thanks!谢谢

J. Steinhardt (Stanford) Learning and Inference September 8, 2015 31 / 31