36
2010, October 29th 2010, October 29th Contact: Shou Matsumoto (cardialfly@[yahoo|gmail].com) On Implementing On Implementing Probabilistic Relational Models Probabilistic Relational Models

UnBBayes-PRM - On Implementing Probabilistic Relational Models

Embed Size (px)

DESCRIPTION

UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning.This presentation talks about UnBBayes-PRM, a plugin for UnBBayes that has a simple implementation of Probabilistic Relational Models.This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/) in October 29, 2010.

Citation preview

Page 1: UnBBayes-PRM - On Implementing Probabilistic Relational Models

2010, October 29th2010, October 29thContact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)

On ImplementingOn ImplementingProbabilistic Relational ModelsProbabilistic Relational Models

Page 2: UnBBayes-PRM - On Implementing Probabilistic Relational Models

*Project page: http://sourceforge.net/projects/unbbayes/

ContentContentPurposePurposeContextualizationContextualization

•E/RE/R•PRMPRM•Link UncertaintyLink Uncertainty

A Java implementationA Java implementation•UnBBayes-PRM*UnBBayes-PRM*

Page 3: UnBBayes-PRM - On Implementing Probabilistic Relational Models

3

ObjectivesObjectives

What is this presentation for?What is this presentation for?– Overview of PRM and its Overview of PRM and its

underlying conceptsunderlying concepts– Overview of extensions of PRMOverview of extensions of PRM

• Link uncertaintyLink uncertainty– To present a simple To present a simple

implementation of PRMimplementation of PRM• UnBBayes-PRM UnBBayes-PRM

Purp

ose

Purp

ose

Page 4: UnBBayes-PRM - On Implementing Probabilistic Relational Models

4

MotivationsMotivations

E/R models are heavily usedE/R models are heavily used– Most of commercial databases are Most of commercial databases are

based on E/R modelsbased on E/R models PRM allows E/R with uncertaintyPRM allows E/R with uncertainty

– PRM is compatible with optimizations PRM is compatible with optimizations of BN and E/Rof BN and E/R

Implementations of PRM are rareImplementations of PRM are rare

Purp

ose

Purp

ose

Page 5: UnBBayes-PRM - On Implementing Probabilistic Relational Models

We assume you have basic knowledge about Bayesian Networks5

TargetTarget For whom is this presentation intended?For whom is this presentation intended?

– People interested on PRMPeople interested on PRM• E.g. Database architects willing to incorporate E.g. Database architects willing to incorporate

probabilistic reasoningprobabilistic reasoning• People looking for a BN extension with the People looking for a BN extension with the

expressiveness of relational calculusexpressiveness of relational calculus– People looking for a PRM toolPeople looking for a PRM tool

• E.g. Developers looking for a sample E.g. Developers looking for a sample implementationimplementation

• Learners willing to exercise PRMLearners willing to exercise PRM

Purp

ose

Purp

ose

Page 6: UnBBayes-PRM - On Implementing Probabilistic Relational Models

6

What is PRM?What is PRM?C

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

BNBN E/RE/R

PRMPRMPRMPRM

++

==

Page 7: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Attributes holds actual data content.7

What is E/R?What is E/R?C

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

E/R = Entity-RelationshipE/R = Entity-Relationship Abstract conceptual representation of dataAbstract conceptual representation of data

– Often used in relational database modelsOften used in relational database models• E.g. Oracle, MySQL, PostgreSQL...E.g. Oracle, MySQL, PostgreSQL...

Entities = “nouns”Entities = “nouns”– A set of elements in a domainA set of elements in a domain

Relationships = “verbs”Relationships = “verbs”– Captures how 2 or more entities are relatedCaptures how 2 or more entities are related

Attributes = “characteristics”Attributes = “characteristics”

Page 8: UnBBayes-PRM - On Implementing Probabilistic Relational Models

8

What is E/R?What is E/R?C

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

ConstraintsConstraints– CardinalityCardinality

• 1-1, 1-many, many-1, many-many1-1, 1-many, many-1, many-many– Primary Key (PK): Primary Key (PK):

• minimal set of uniquely identifying attributesminimal set of uniquely identifying attributes– Foreign Key (FK): Foreign Key (FK):

• Attributes that refers to other attributes (PK)Attributes that refers to other attributes (PK)– This is used to conduct relationshipsThis is used to conduct relationships

– Allowed valuesAllowed values– Etc.Etc.

Page 9: UnBBayes-PRM - On Implementing Probabilistic Relational Models

UnBBayes-PRM sees E/R as a set of tables.9

What is E/R?What is E/R?C

onte

xtua

lizat

ion

Con

text

ualiz

atio

n E/R can be represented as a set of TablesE/R can be represented as a set of Tables– Entities → tablesEntities → tables– Attributes → columnsAttributes → columns– Values of attributes → content of a cellValues of attributes → content of a cell– 1-1 and 1-many (many-1) relationships → FK1-1 and 1-many (many-1) relationships → FK– Many-many relationships → table + FKMany-many relationships → table + FK

Problem Problem – Classic E/R models do not handle uncertaintyClassic E/R models do not handle uncertainty

Page 10: UnBBayes-PRM - On Implementing Probabilistic Relational Models

10

So, what is PRM?So, what is PRM? Probabilistic Relational ModelsProbabilistic Relational Models

– Template for probability distribution over a Template for probability distribution over a database (E/R model)database (E/R model)• Compact graphical probabilistic modelCompact graphical probabilistic model

– well defined semanticswell defined semantics• Natural domain modelingNatural domain modeling

– objects, properties, relations...objects, properties, relations...• Attributes can depend on attributes of related Attributes can depend on attributes of related

entitiesentities• Generalization over a variety of situationsGeneralization over a variety of situations

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 11: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Machine learning is a major concern in PRM11

So, what is PRM?So, what is PRM?

PRM's learning algorithmsPRM's learning algorithms– Captures relationships in Bayesian learning Captures relationships in Bayesian learning

algorithmsalgorithms• There's no need to “flatten” databaseThere's no need to “flatten” database

PRM's are composed of:PRM's are composed of:– Relational Schema,Relational Schema,– Relational Skeleton,Relational Skeleton,– Probabilistic distribution.Probabilistic distribution.

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 12: UnBBayes-PRM - On Implementing Probabilistic Relational Models

12

SchemaSchema Static partStatic part

– Entities + Relationships + AttributesEntities + Relationships + Attributes– PK, FK, possible (allowed) values...PK, FK, possible (allowed) values...

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Person

Father : FK to PersonMother: FK to Person

ID: PK

BloodType : any of {A,B,AB,O}

PersonPerson BloodType

hasFather

hasMother

Page 13: UnBBayes-PRM - On Implementing Probabilistic Relational Models

13

SkeletonSkeleton Dynamic partDynamic part

– Instantiation of a SchemaInstantiation of a Schema– Actual objectsActual objects

• Attributes are filled with some valuesAttributes are filled with some values

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Father: NULLMother: NULL

ID: Augustine

BloodType: OFather: Augustine

Mother: Mary

ID: George

BloodType: NULL

Father: NULLMother: NULL

ID: Mary

BloodType: A

Page 14: UnBBayes-PRM - On Implementing Probabilistic Relational Models

(Slot chain = empty) := no parents | parents reside in the same table14

PRM's structurePRM's structure Schema + probabilistic dependenciesSchema + probabilistic dependencies Attributes have path expressions describing their Attributes have path expressions describing their

parents of that attribute.parents of that attribute.– Path expressions = slot chainPath expressions = slot chain

• List of FKList of FK– If slot chain contains 1-many relationship, the If slot chain contains 1-many relationship, the

number of parents is unknownnumber of parents is unknown Conditional Probability Distribution (CPD)Conditional Probability Distribution (CPD)

– Conditional Probability Table (CPT)Conditional Probability Table (CPT)– Functions + parametersFunctions + parameters

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 15: UnBBayes-PRM - On Implementing Probabilistic Relational Models

15

CPD of BloodTypeCPD of BloodType

PRM's structurePRM's structureC

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

Father A A A ...

Mother A B AB ...

A 75% 25% 50% ...

B 0% 25% 25% ...

AB 0% 25% 25% ...

O 25% 25% 0% ...

PersonPerson

FatherMother

PK FK2FK1

Edge from BloodType

of the objectreferenced by FK1

Edge from BloodType

of the objectreferenced by FK1

Edge from BloodType

of the objectreferenced by FK2

Edge from BloodType

of the objectreferenced by FK2

John Doe

Me

Jane Doe

InstantiationInstantiationInstantiationInstantiation

BloodType

Page 16: UnBBayes-PRM - On Implementing Probabilistic Relational Models

UnBBayes-PRM uses the approach 216

CPD with aggregationCPD with aggregation How do we declare the CPD if the number of parents is How do we declare the CPD if the number of parents is

unknown?unknown? Approach 1Approach 1: special purpose scripts: special purpose scripts

– E.g. UnBBayes-MEBN's CPD scriptsE.g. UnBBayes-MEBN's CPD scripts• A set of IF-THEN-ELSE statementsA set of IF-THEN-ELSE statements

Approach 2Approach 2: aggregation: aggregation– E.g. Mode, Max, Min, Average...E.g. Mode, Max, Min, Average...

• Equivalent to an intermediate “deterministic” nodeEquivalent to an intermediate “deterministic” node

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 17: UnBBayes-PRM - On Implementing Probabilistic Relational Models

17

InferenceInference

Instantiation of a BN from skeletonInstantiation of a BN from skeleton Descriptive attributes become random Descriptive attributes become random

variablesvariables Once generated, further inference is done as Once generated, further inference is done as

normal BN (evidence propagation)normal BN (evidence propagation)

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 18: UnBBayes-PRM - On Implementing Probabilistic Relational Models

18

Does the instantiated BN Does the instantiated BN have cycles?have cycles?

Case 1Case 1: check at PRM schema level: check at PRM schema level– Schema has no cycle → instances have no cycleSchema has no cycle → instances have no cycle

Case 2Case 2: schema contains cycles, but the instantiated BN : schema contains cycles, but the instantiated BN does notdoes not

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

ID: Augustine

BloodType

ID: GeorgeWashingtonBloodType

ID: MaryBloodType

Person

PersonPerson

(Father) (Mother)

Page 19: UnBBayes-PRM - On Implementing Probabilistic Relational Models

OBS. Link uncertainty is not implemented in UnBBayes-PRM19

Extension: Extension: link uncertaintylink uncertainty

We only mentioned about distribution over attributes We only mentioned about distribution over attributes of the objects in a modelof the objects in a model– Only the values of the attributes were uncertainOnly the values of the attributes were uncertain

Uncertainty over relational structure of domain was Uncertainty over relational structure of domain was not addressed yetnot addressed yet– Structure uncertaintyStructure uncertainty

• Values of FK are uncertainValues of FK are uncertain– Slot chains are uncertainSlot chains are uncertain

Reference uncertaintyReference uncertainty & & existence uncertaintyexistence uncertainty

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 20: UnBBayes-PRM - On Implementing Probabilistic Relational Models

20

Reference uncertaintyReference uncertainty

Slots' (FK) values become a random variableSlots' (FK) values become a random variable– ProblemProblem

• Unknown number of possible valuesUnknown number of possible values– It's difficult to declare CPD at schema levelIt's difficult to declare CPD at schema level

– SolutionSolution• Create partitions based on “other attributes”Create partitions based on “other attributes”

– Assuming that ordinal attributes has a Assuming that ordinal attributes has a known number of possible valuesknown number of possible valuesC

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

Page 21: UnBBayes-PRM - On Implementing Probabilistic Relational Models

We can now specify parents of FKs and CPD21

Reference uncertaintyReference uncertaintyC

onte

xtua

lizat

ion

Con

text

ualiz

atio

n

Entity1Entity1

PKFKToEntity2

Entity2Entity2

PKBooleanAttrib

Possible values:PKs of Entity2

(unknown)

Entity1Entity1PK

Selector

Entity2Entity2

PKBooleanAttrib

Possible values:2 (true/false)

Link to a set (partition) of instances of Entity2, based on the current value of BooleanAttrib

Link to a single instance of Entity2based on the current value of PK

FKToEntity2

Page 22: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Extracted from Probabilistic Relational Models (Getoor et al., SRL07)22

Reference uncertainty:Reference uncertainty:instantiating BNinstantiating BN

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Edge types:Edge types:– I: within single objectI: within single object– II: between objectsII: between objects– III: from FKs of a slot chainIII: from FKs of a slot chain– IV: from partition attributes to selectorsIV: from partition attributes to selectors– V: from selectors to FKV: from selectors to FK

Page 23: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Objects are related to every possible objects, with 0% ~ 100%23

Existence uncertaintyExistence uncertainty Creation of a Boolean attribute “Exists” in tablesCreation of a Boolean attribute “Exists” in tables

– Technically, entities also contain “Exists”Technically, entities also contain “Exists”• But we assume instances (objects) of entities But we assume instances (objects) of entities

“do exist” if they were instantiated“do exist” if they were instantiated– So, this mechanism is mainly for So, this mechanism is mainly for

relationshipsrelationships– Because “Exists” is not a FK, we can use it as a Because “Exists” is not a FK, we can use it as a

normal random variable.normal random variable.• No major changes on BN instantiationNo major changes on BN instantiation

Con

text

ualiz

atio

nC

onte

xtua

lizat

ion

Page 24: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Project page: http://sourceforge.net/projects/unbbayes/24

UnBBayes-PRMUnBBayes-PRM Open-source Java softwareOpen-source Java software

– GUI & inference machineGUI & inference machine FeaturesFeatures

– Edit Schema and Skeleton as tablesEdit Schema and Skeleton as tables– Edit probabilistic dependencies as CPTEdit probabilistic dependencies as CPT– Edit constraints (PK, FK and allowed values)Edit constraints (PK, FK and allowed values)– Generate BN from SkeletonGenerate BN from Skeleton– Save/load projects from fileSave/load projects from file

Developed as a plug-in for UnBBayes:Developed as a plug-in for UnBBayes:– Alpha version (for internal use)Alpha version (for internal use)

A J

ava

Impl

emen

tatio

nA

Jav

a Im

plem

enta

tion

Page 25: UnBBayes-PRM - On Implementing Probabilistic Relational Models

A plugin descriptor is the main and minimal content of a plugin25

UnBBayes-PRMUnBBayes-PRMA

Jav

a Im

plem

enta

tion

A J

ava

Impl

emen

tatio

n

Page 26: UnBBayes-PRM - On Implementing Probabilistic Relational Models

A plugin descriptor is the main and minimal content of a plugin26

UnBBayes-PRMUnBBayes-PRMA

Jav

a Im

plem

enta

tion

A J

ava

Impl

emen

tatio

n

Page 27: UnBBayes-PRM - On Implementing Probabilistic Relational Models

27

UnBBayes-PRMUnBBayes-PRMA

Jav

a Im

plem

enta

tion

A J

ava

Impl

emen

tatio

n

Page 28: UnBBayes-PRM - On Implementing Probabilistic Relational Models

PRM is currently stored as a SQL script. This is a temporary solution.28

UnBBayes-PRM - I/OUnBBayes-PRM - I/OA

Jav

a Im

plem

enta

tion

A J

ava

Impl

emen

tatio

n /* Table and PK declaration */CREATE TABLE "Person" (

"id" VARCHAR2(300) not null, "Father" VARCHAR2(300) , "Mother" VARCHAR2(300) , "BloodType" VARCHAR2(300)

);ALTER TABLE "Person" ADD CONSTRAINT PK_Person

PRIMARY KEY ("id");/* Possible values */ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType

CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));/* Foreign keys (relationships) */ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father

FOREIGN KEY ("Father") REFERENCES "Person" ("id");ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother

FOREIGN KEY ("Mother") REFERENCES "Person" ("id");

Page 29: UnBBayes-PRM - On Implementing Probabilistic Relational Models

This is also a temporary solution.29

UnBBayes-PRM - I/OUnBBayes-PRM - I/OA

Jav

a Im

plem

enta

tion

A J

ava

Impl

emen

tatio

n

COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()[ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0 0.0 0.25 0.25 0.25 0.25 0.25 (...)(...) }';

Dependencies are stored as in-table commentsDependencies are stored as in-table comments

Basic format: Basic format: – <listOfParents>;{<listOfProbabilities>}<listOfParents>;{<listOfProbabilities>}

<listOfParents> := comma separated list<listOfParents> := comma separated list– <parentClass>.<parentColumn><parentClass>.<parentColumn>

(<aggregateFunction>){<listOfForeignKeys>}(<aggregateFunction>){<listOfForeignKeys>}• <listOfForeignKeys> represents a slot chain<listOfForeignKeys> represents a slot chain

Page 30: UnBBayes-PRM - On Implementing Probabilistic Relational Models

30

UnBBayes-PRM:UnBBayes-PRM:limitationslimitations

No support for link uncertaintyNo support for link uncertainty– But existence uncertainty can be “simulated”But existence uncertainty can be “simulated”

Only 1 attribute as PKOnly 1 attribute as PK Only String types allowedOnly String types allowed

– Thus, no sequences are allowedThus, no sequences are allowed No marginalizationNo marginalization

– Cannot delete dependencies Cannot delete dependencies • We must re-create attribute or edit the SQL We must re-create attribute or edit the SQL

scriptscriptA J

ava

Impl

emen

tatio

nA

Jav

a Im

plem

enta

tion

Page 31: UnBBayes-PRM - On Implementing Probabilistic Relational Models

31

UnBBayes-PRM:UnBBayes-PRM:limitationslimitations

2 edges (dependencies) to a same attribute is 2 edges (dependencies) to a same attribute is not allowednot allowed– Even using different slot chainsEven using different slot chains

3 aggregation functions: 3 aggregation functions: – mode, min, max.mode, min, max.

No machine No machine learninglearning No direct access to an actual database (yet)No direct access to an actual database (yet)

– Only by means of a SQL script.Only by means of a SQL script. A J

ava

Impl

emen

tatio

nA

Jav

a Im

plem

enta

tion

Page 32: UnBBayes-PRM - On Implementing Probabilistic Relational Models

DBMS = DataBase Management System32

UnBBayes-PRM:UnBBayes-PRM:(possible) future works(possible) future works

Add extension points for plug-insAdd extension points for plug-ins Integration with DBMSIntegration with DBMS

– Constraints/rules can be delegated to DBMSConstraints/rules can be delegated to DBMS• Some of the limitations may be automatically fixedSome of the limitations may be automatically fixed

Implement machine learning and link Implement machine learning and link uncertaintyuncertainty

Edit E/R models as diagramsEdit E/R models as diagrams PRM → MSBN compilationPRM → MSBN compilation

Con

clus

ion

Con

clus

ion

Page 33: UnBBayes-PRM - On Implementing Probabilistic Relational Models

¹A Java open-source tool from University of Massachusetts Amherst33

UnBBayes-PRM:UnBBayes-PRM:(possible) future works(possible) future works

Implement Dynamic PRM Implement Dynamic PRM – Dynamic BN + E/RDynamic BN + E/R

Integration with PROXIMITY¹Integration with PROXIMITY¹– RDN - Relational Dependency NetworkRDN - Relational Dependency Network

• Generalization of BN + E/R + Relational Markov Generalization of BN + E/R + Relational Markov NetworkNetworkCon

clus

ion

Con

clus

ion

Page 34: UnBBayes-PRM - On Implementing Probabilistic Relational Models

34

FinallyFinally

PRM looks practicalPRM looks practical– Uncertainty on relational dataUncertainty on relational data

• Immediate applicability in databasesImmediate applicability in databases– Advanced DBMS can add advanced Advanced DBMS can add advanced

featuresfeatures Machine learning seems to be PRM's major Machine learning seems to be PRM's major

concernconcern– It was not addressed by this presentationIt was not addressed by this presentation

Con

clus

ion

Con

clus

ion

Page 35: UnBBayes-PRM - On Implementing Probabilistic Relational Models

35

FinallyFinally

PRM cannot specify advanced rules and PRM cannot specify advanced rules and constraints on conditional probabilitiesconstraints on conditional probabilities– Some conditions must be fulfilled “manually”Some conditions must be fulfilled “manually”– Some may be fulfilled by DBMS' featuresSome may be fulfilled by DBMS' features

UnBBayes-PRM provides an editor and inference UnBBayes-PRM provides an editor and inference engine for basic PRMengine for basic PRMC

oncl

usio

nC

oncl

usio

n

Page 36: UnBBayes-PRM - On Implementing Probabilistic Relational Models

Project page: http://sourceforge.net/projects/unbbayes/

Questions?Questions?