42
Relational Probability Models Brian Milch MIT IPAM Summer School July 16, 2007

Relational Probability Models

  • Upload
    feivel

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Relational Probability Models. Brian Milch MIT IPAM Summer School July 16, 2007. Objects, Attributes, Relations. Specialty: RL. Specialty: BNs. AuthorOf. Reviews. AuthorOf. AuthorOf. Topic: RL. Topic: RL. Topic: BNs. Topic: Theory. AuthorOf. Topic: Theory. AuthorOf. Reviews. - PowerPoint PPT Presentation

Citation preview

Page 1: Relational Probability Models

Relational Probability Models

Brian Milch

MIT

IPAM Summer School

July 16, 2007

Page 2: Relational Probability Models

2

Objects, Attributes, Relations

AuthorOf

AuthorOf

AuthorOfAuthorOf

AuthorOf

Reviews

Reviews

Specialty: TheorySpecialty: Theory

Specialty: RL Specialty: BNs

Topic: RL

Topic: Theory

Topic: Theory

Topic: RL

Topic: BNs

AuthorOf

Page 3: Relational Probability Models

3

Specific Scenario

Prof. Smith

Smith98a Smith00

AuthorOf AuthorOf

Bayesian

networks

have

become

a

word 1

word 2

word 3

word 4

word 5 …

InDoc

word 1

word 2

word 3

word 4

word 5…InDoc

Page 4: Relational Probability Models

4

Bayesian Network for This Scenario

Smith98aTopic

Smith00Topic

SmithSpecialty

Bayesian

networks

become

BNs

BNs

BNs

Smith98a word 1

Smith98a word 2

Smith98a word 3

Smith98a word 4

Smith00 word 1

Smith00 word 2

Smith00 word 3

Smith00 word 4

have

BN is specific to Smith98a and Smith00

Page 5: Relational Probability Models

5

Abstract Knowledge

• Humans have abstract knowledge that can be applied to any individuals

• How can such knowledge be:– Represented?– Learned?– Used in reasoning?

Page 6: Relational Probability Models

6

Outline

• Relational probability models (RPMs)

– Representation– Inference– Learning

• Relational uncertainty: extending RPMs with probabilistic models for relations

Abstract probabilisticmodel for attributes

+relational skeleton:objects & relations

graphicalmodel

Page 7: Relational Probability Models

7

Representation

• Have to represent– Set of variables– Dependencies– Conditional probability

distributions (CPDs)

• Many proposed languages• We’ll use Bayesian logic (BLOG)

[Milch et al. 2005]

All depend on relational skeleton

Page 8: Relational Probability Models

8

Typed First-Order Logic

• Objects divided into typesResearcher, Paper, WordPos, Word, Topic

• Express attributes and relations with functions and predicatesFirstAuthor(paper) Researcher (non-random)Specialty(researcher) Topic (random)Topic(paper) Topic (random)Doc(wordpos) Paper (non-random)WordAt(wordpos) Word (random)

Page 9: Relational Probability Models

9

Set of Random Variables

• For random functions, have variable for each tuple of argument objects

Researcher: Smith, Jones Paper: Smith98a, Smith00, Jones00WordPos: Smith98a_1, …, Smith98a_3212, Smith00_1, etc.

Specialty(Smith) Specialty(Jones)

Topic(Smith98a) Topic(Smith00) Topic(Jones00)

WordAt(Smith98a_1) WordAt(Smith98a_3212)

WordAt(Smith00_1) WordAt(Smith00_2774)

WordAt(Jones00_1) WordAt(Jones00_4893)

Specialty:

Topic:

WordAt:

Page 10: Relational Probability Models

10

Dependency Statements

Specialty(r) ~ TabularCPD[[0.5, 0.3, 0.2]];

Topic(p) ~ TabularCPD[[0.90, 0.01, 0.09], [0.02, 0.85, 0.13], [0.10, 0.10, 0.80]] (Specialty(FirstAuthor(p)));

WordAt(wp) ~ TabularCPD[[0.03,..., 0.02, 0.001,...], [0.03,..., 0.001, 0.02,...], [0.03,..., 0.003, 0.003,...]] (Topic(Doc(wp)));

BNs RL Theory

BNs RL Theory| BNs| RL| Theory

Logical term identifying parent node

the Bayesian reinforcement| BNs| RL| Theory

Page 11: Relational Probability Models

11

Conditional Dependencies

• Predicting the length of a paper– Conference paper: generally equals

conference page limit– Otherwise: depends on verbosity of author

• Model this with conditional dependency statement

Length(p)

if ConfPaper(p) then ~ PageLimitPrior()

else ~ LengthCPD(Verbosity(FirstAuthor(p)));

First-order formula as condition

Page 12: Relational Probability Models

12

Variable Numbers of Parents

• What if we allow multiple authors?– Let skeleton specify predicate AuthorOf(r, p)

• Topic(p) now depends on specialties of multiple authors

Number of parents depends on skeleton

Page 13: Relational Probability Models

13

Aggregation

• Aggregate distributions

• Aggregate values

Topic(p) ~ TopicAggCPD({Specialty(r) for Researcher r : AuthorOf(r, p)});

multiset defined by formula

mixture of distributions conditioned on individual elements of multiset [Taskar et al., IJCAI 2001]

Topic(p) ~ TopicCPD(Mode({Specialty(r) for Researcher r : AuthorOf(r, p)}));

aggregation function

Page 14: Relational Probability Models

14

Semantics: Ground BN

FirstAuthorFirstAuthor

R1 R2

P1 P2P3

………

Spec(R1) Spec(R2)

Topic(P3)Topic(P2)

W(P3_1) W(P3_4893)W(P2_1) W(P2_2774)W(P1_1) W(P1_3212)

Skeleton

Topic(P1)

Ground BN

FirstAuthor

3212 words 2774 words 4893 words

Page 15: Relational Probability Models

15

When Is Ground BN Acyclic?

• Look at symbol graph– Node for each random function– Read off edges from

dependency statements

• Theorem: If symbol graph is acyclic, then ground BN is acyclic for every skeleton

Specialty

Topic

WordAt

[Koller & Pfeffer, AAAI 1998]

Page 16: Relational Probability Models

16

Acyclic Relations

• Suppose researcher’s specialty depends on his/her advisor’s specialty

• Symbol graph has self-loop!• Require certain nonrandom

functions to be acyclic: F(x) < x under some partial order

• Label edges with “<“ and “=“ signs; get stronger acyclicity theorem

[Friedman et al., IJCAI 1999]

<Specialty(r) if Advisor(r) != null then ~ SpecCPD(Specialty(Advisor(r))) else ~ SpecialtyPrior();

Specialty

Topic

WordAt

Page 17: Relational Probability Models

17

Inference: Knowledge-Based Model Construction (KBMC)

• Construct relevant portion of ground BN

?

……

Spec(R1) Spec(R2)

Topic(P3)Topic(P2)

W(P3_1) W(P3_4893)W(P1_1) W(P1_3212)

Topic(P1)

R1 R2

P1P2

P3

Skeleton:

Constructed BN:

[Breese 1992; Ngo & Haddawy 1997]

Page 18: Relational Probability Models

18

Inference on Constructed Network

• Run standard BN inference algorithm– Exact: variable elimination/junction tree– Approx: Gibbs sampling, loopy belief

propagation

• Exploit some repeated structure with lifted inference [Pfeffer et al., UAI 1999; Poole, IJCAI 2003; de Salvo Braz et al., IJCAI 2005]

Page 19: Relational Probability Models

19

Lifted Inference

• Suppose:• With n researchers, part of ground BN is:

• Could sum out ThesisTopic(R) nodes one by one• But parameter sharing implies:

– Summing same potential every time– Obtain same potential over Specialty(R) for each R

• Can just do summation once, eliminate whole family of RVs, store “lifted” potential on Specialty(r)

[Pfeffer et al., UAI 1999; Poole, IJCAI 2003; Braz et al., IJCAI 2005]

Specialty(R1) Specialty(Rn)…

ThesisTopic(R1) ThesisTopic(Rn)

Specialty(r) ~ SpecCPD(ThesisTopic(r));

Page 20: Relational Probability Models

20

Learning

• Assume types, functions are given

• Straightforward task: given structure, learn parameters– Just like in BNs, but parameters are shared

across variables for same function, e.g.,Topic(Smith98a), Topic(Jones00), etc.

• Harder task: learn dependency structure

Page 21: Relational Probability Models

21

Structure Learning for BNs

• Find BN structure M that maximizes

• Greedy local search over structures– Operators: add, delete, reverse edges– Exclude cyclic structures

dMpMpMp )|(),|data()(

Page 22: Relational Probability Models

22

Logical Structure Learning

• In RPM, want logical specification of each node’s parent set

• Deterministic analogue: inductive logic programming (ILP) [Dzeroski & Lavrac 2001; Flach and Lavrac 2002]

• Classic work on RPMs by Friedman, Getoor, Koller & Pfeffer [1999]– We’ll call their models FGKP models

(they call them “probabilistic relational models” (PRMs))

Page 23: Relational Probability Models

23

FGKP Models

• Each dependency statement has form:

where s1,...,sk are slot chains

• Slot chains– Basically logical terms: Specialty(FirstAuthor(p))– But can also treat predicates as “multi-valued

functions”: Specialty(AuthorOf(p))

Func(x) ~ TabularCPD[...](s1,..., sk)

Smith&Jones01

Smith

Jones

BNs

RLAuthorOf

AuthorOf Specialty

Specialty

aggregate

Page 24: Relational Probability Models

24

Structure Learning for FGKP Models

• Greedy search again– But add or remove whole slot chains– Start with chains of length 1, then 2, etc.– Check for acyclicity using symbol graph

Page 25: Relational Probability Models

25

Outline

• Relational probability models (RPMs)

– Representation– Inference– Learning

• Relational uncertainty: extending RPMs with probabilistic models for relations

Abstract probabilisticmodel for attributes

+relational skeleton:objects & relations

graphicalmodel

Page 26: Relational Probability Models

26

Relational Uncertainty: Example

• Questions: Who will review my paper, and what will its average review score be?

Reviews

AuthorOf AuthorOf ReviewsReviews

Specialty: RLGenerosity: 2.9

Specialty: Prob. ModelsGenerosity: 2.2

Topic: RLAvgScore: ?

Topic: RLAvgScore: ?

Topic: Prob ModelsAvgScore: ?

ReviewsSpecialty: TheoryGenerosity: 1.8

Page 27: Relational Probability Models

27

Possible Worlds

RL1.0

RL1.0

RL1.0

RL1.0 RL

1.0

RL1.0

RL1.0

RL1.0

RL1.0 RL

1.0

RL1.0

RL1.0

RL1.0

RL1.0 RL

1.0RL1.0

RL1.0

RL2.3

Theory 1.9

RL3.1

Theory2.7

RL1.8

RL2.3

Theory 1.9

RL3.1

BNs2.7 RL

1.8RL2.1

RL2.3

Theory 1.9

RL3.1

Theory2.7

RL1.8RL

2.1

RL1.0

RL2.1

Page 28: Relational Probability Models

28

Simplest Approach to Relational Uncertainty

• Add predicate Reviews(r, p)• Can model this with existing syntax:

• Potential drawback:– Reviews(r, p) nodes are independent given

specialties and topics– Expected number of reviews per paper

grows with number of researchers in skeleton

[Getoor et al., JMLR 2002]

Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p));

Page 29: Relational Probability Models

29

Another Approach: Reference Uncertainty

• Say each paper gets k reviews– Can add Review objects to skeleton– For each paper p, include k review objects rev with PaperReviewed(rev) = p

• Uncertain about values of function Reviewer(rev)

[Getoor et al., JMLR 2002]

PaperReviewed

??

?

Reviewer

Page 30: Relational Probability Models

30

Models for Reviewer(rev)

• Explicit distribution over researchers?– No: won’t generalize across skeletons

• Selection models:– Uniform sampling from researchers with

certain attribute values [Getoor et al., JMLR 2002]

– Weighted sampling, with weights determined by attributes [Pasula et al., IJCAI 2001]

Page 31: Relational Probability Models

31

Examples of Reference Uncertainty

• Choosing based on Specialty attribute

• Choosing by weighted sampling:

ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev)));

Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)});

Weight(rev, r) = CompatibilityWeight (Topic(PaperReviewed(rev)), Specialty(r));

Reviewer(rev) ~ WeightedSample({(r, Weight(rev, r)) for Researcher r});

set of pairs as CPD argument

Page 32: Relational Probability Models

32

Context-Specific Dependencies

RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev)));

AvgScore(p) = Mean({RevScore(rev) for Review rev : PaperReviewed(Rev) = p});

random object

• Consequence of relational uncertainty: dependencies become context-specific– RevScore(Rev1) depends on Generosity(R1)

only when Reviewer(Rev1) = R1

Page 33: Relational Probability Models

33

Semantics: Ground BN

• Can still define ground BN• Parents of node X are all basic RVs whose

values are potentially relevant in evaluating the right hand side of X’s dependency statement

• Example: for RevScore(Rev1)…

– Reviewer(Rev1) is always relevant– Generosity(R) might be relevant for any

researcher R

RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev)));

Page 34: Relational Probability Models

34

Ground BN

Topic(P1)

Specialty(R1)

Specialty(R2)

Specialty(R3)

Generosity(R1)

Generosity(R2)

Generosity(R3)

RevScore(Rev2)RevScore(Rev1)

Reviewer(Rev2)Reviewer(Rev1)

RevSpecialty(Rev2)RevSpecialty(Rev1)

Page 35: Relational Probability Models

35

Inference

• Can still use ground BN, but it’s often very highly connected

• Alternative: Markov chain over possible worlds [Pasula & Russell, IJCAI 2001]

– In each world, only certain dependencies are active

Page 36: Relational Probability Models

36

Markov Chain Monte Carlo (MCMC)

• Markov chain 1, 2, ... over worlds in E

• Designed so unique stationary distribution is proportional to p()

• Fraction of 1,2,..., N in Q converges to P(Q|E) as N

E

Q

Page 37: Relational Probability Models

37

Metropolis-Hastings MCMC

• Metropolis-Hastings process: in world ,– sample new world from proposal

distribution q(|)– accept proposal with probability

otherwise remain in • Stationary distribution is p()

)|()(

)|()(,1min

qp

qp

Page 38: Relational Probability Models

38

Computing Acceptance Ratio Efficiently

• World probability is

where pa(X) is instantiation of X’s active parents in

• If proposal changes only X, then all factors not containing X cancel in p() and p()

• Result: Time to compute acceptance ratio often doesn’t depend on number of objects

X

XxXPp ))(pa|()(

[Pasula et al., IJCAI 2001]

Page 39: Relational Probability Models

39

Learning Models for Relations

• Binary predicate approach:

– Use existing search over slot chains

• Selecting based on attributes

– Search over sets of attributes to look at– Search over parent slot chains for

choosing attribute values

Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p));

ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev)));

Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)});

[Getoor et al., JMLR 2002]

Page 40: Relational Probability Models

40

Summary

• Human knowledge is more abstract than basic graphical models

• Relational probability models– Logic-based representation– Structure learning by search over slot chains– Inference by KBMC

• Relational uncertainty– Natural extension to logic-based representation– Approximate inference by MCMC

Page 41: Relational Probability Models

41

References

• Wellman, M. P., Breese, J. S., and Goldman, R. P. (1992) “From knowledge bases to decision models”. Knowledge Engineering Review 7:35-53.

• Breese, J.S. (1992) “Construction of belief and decision networks”. Computational Intelligence 8(4):624-647.

• Ngo, L. and Haddawy, P. (1997) “Answering queries from context-sensitive probabilistic knowledge bases”. Theoretical Computer Sci. 171(1-2):147-177.

• Koller, D. and Pfeffer, A. (1998) “Probabilistic frame-based systems”. In Proc. 15th AAAI National Conf. on AI, pages 580-587.

• Friedman, N., Getoor, L., Koller, D.,and Pfeffer, A. (1999) “Learning probabilistic relational models”. In Proc. 16th Int’l Joint Conf. on AI, pages 1300-1307.

• Pfeffer, A., Koller, D., Milch, B., and Takusagawa, K. T. (1999) “SPOOK: A System for Probabilistic Object-Oriented Knowledge”. In Proc. 15th Conf. on Uncertainty in AI, pages 541-550.

• Taskar, B., Segal, E., and Koller, D. (2001) “Probabilistic classification and clustering in relational data”. In Proc. 17th Int’l Joint Conf. on AI, pages 870-878.

• Getoor, L., Friedman, N., Koller, D., and Taskar, B. (2002) “Learning probabilistic models of link structure”. J. Machine Learning Res. 3:679-707.

• Taskar, B., Abbeel, P., and Koller, D. (2002) “Discriminative probabilistic models for relational data”. In Proc. 18th Conf. on Uncertainty in AI, pages 485-492.

Page 42: Relational Probability Models

42

References

• Poole, D. (2003) “First-order probabilistic inference”. In Proc. 18th Int’l Joint Conf. on AI, pages 985-991.

• de Salvo Braz, R. and Amir, E. and Roth, D. (2005) “Lifted first-order probabilistic inference.” In Proc. 19th Int’l Joint Conf. on AI, pages 1319-1325.

• Dzeroski, S. and Lavrac, N., eds. (2001) Relational Data Mining. Springer.• Flach, P. and Lavrac, N. (2002) “Learning in Clausal Logic: A Perspective on

Inductive Logic Programming”. In Computational Logic: Logic Programming and Beyond (Essays in Honour of Robert A. Kowalski), Springer Lecture Notes in AI volume 2407, pages 437-471.

• Pasula, H. and Russell, S. (2001) “Approximate inference for first-order probabilistic languages”. In Proc. 17th Int’l Joint Conf. on AI, pages 741-748.

• Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “BLOG: Probabilistic Models with Unknown Objects”. In Proc. 19th Int’l Joint Conf. on AI, pages 1352-1359.