Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh buchanan @ cs.pitt.edu josephp @ cs.pitt.edu

Knowledge-Based Discovery:Using Semantics in Machine Learning

Bruce BuchananJoe Phillips

University of Pittsburgh

buchanan @ cs.pitt.edujosephp @ cs.pitt.edu

Intelligent Systems Laboratory

• Faculty: Bruce Buchanan, P.I., John Aronis• Collaborators: John Rosenberg (Biol.Sci.), Greg

Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson (Rehab.Sci.), Russ Altman (Stanford MIS)

• Research Associates: Joe Phillips, Paul Hodor, Vanathi Gopalakrishnan, Wendy Chapman

• Ph.D. Students: Gary Livingston, Dan Hennessy, Venkat Kolluri, Will Bridewell, Lili Ma

• M.S. Students: Karl Gossett

GOALS(A) Learn understandable &

interesting rules from data

(B) Construct an understandable & coherent model from rules

METHOD:To use background knowledge to search for:

simple rules with familiar predicatesinteresting and novel rulescoherent models

• Familiar Syntax – (conditional rules)

• Syntactically Simple• Semantically Simple

• Familiar Predicates• Accurate Predictions• Meaningful Rules• Relevant to Question• Novel• Cost-Effective• Coherent Model

Rules or Models:Understandable | Interesting

The RL ProgramExplicit Bias

TrainingExamples

RL

RULES

NewCases

Performance Program

Predictions

MODEL

PartialDomainModel

HAMB

(A) Individual Rules

• J. Phillips

• Rehabilitation Medicine Data

Simple single rules

• Syntactic Simplicity– Fewer terms on the LHS

• Explicitly stated constraints (rules with no more than N terms)

• Tagged attributes (e.g. must have at least one control attribute to be interesting)

Simple sets of rules

• Syntactic simplicity– Fewer rules:

• independent rules

• E.g. in physics:

• U(x) = Ugravity(x) + Uelectronic(x) + Umagnetic(x)

• HAMB removes highly similar terms from feature set

– less independence when there’s feedback• e.g. medicine

Interestingness:

• Given, controlled and observed– explicitly state observed attributes as interesting

target

• Temporal– future (or distant past) predictions are interesting

• Influence diagram (e.g. Bayes net)– strong but more indirect influences are interesting

Using typed attribute background knowledge

• Organize terms into “given”, “controlled” and “observed”– E.g. in medical domain “demographics”,

“intervention” and “outcome”

• Benefits:– Categorization of rules by whether they use

givens (default), controls (controllable) or both (conditionally controllable):

Typed attribute example

• Rehab. (RL; Phillips, Buchanan, Penrod)

• > 2000 records

given controlled observed

demographic medical temporal medical

age race sex

admit general_condition

specific_condition

time rate

absolute normalize

Example interestingness:

• Group rules by whether they predict by medical, demographic or both:– by medical:

• Left_Body_Stroke => poor improvement (interesting, expected)

– by demographic:• High_age => poor improvement (interesting, expected)• (Race=X) => poor improvement (interesting, NOT

expected)

Using temporal background knowledge

• Organize data by time– Utility may or may not extend to other metric

spaces (e.g. space, mass)

• Benefits:– Predictions parameterized by time: f(t)

• Future or distant past may be interesting

– Cyclical patterns

Temporal example

• Geophysics (Scienceomatic; Phillips 2000)– Subduction zone discoveries of type:

d(qafter) = d(qmain) + m*[t(qafter)-t(qmain)] + b

– NOTE: This is not an accurate prediction!– interesting, generally quakes can’t be predicted

X

d

Using influence diagram background knowledge

• This is future work!

• Organize terms to follow pre-existing influence diagram– E.g. Bayesian nets, but do not need conditional

probabilities

• Benefits:– Suggest hidden variables, new influences

• f(x) => f’(x,y)

Interestingness summary

• How different types of background knowledge help us achieve interestingness:– Explicitly stated: “observed” attributes– Implicitly stated: parameterized equations with

“interesting” parameters– Learned: “new” influence factors

(B) Coherent Models

• B.Buchanan

• Protein Data

EXAMPLE:Predicting Ca++ Binding Sites

(G.Livingston)

Given 3-d descriptions of 16 sites in proteins that bind calcium ions& 100 other sites that do not

Find a model that allows predictingwhether a proposed new site will bind Ca++ [in terms of subset of 63 attributes]

Ca++ binding sites in proteins SOME ATTRIBUTES

ATOM-NAME-IS-C ATOM-NAME-IS-O CHARGE CHARGE-WITH-HIS HYDROPHOBICITY MOBILITY RESIDUE-CLASS1-IS-CHARGED RESIDUE-CLASS1-IS-HYDROPHOBIC RESIDUE-CLASS2-IS-ACIDIC RESIDUE-CLASS2-IS-NONPOLAR RESIDUE-CLASS2-IS-UNKNOWN

RESIDUE-NAME-IS-ASP RESIDUE-NAME-IS-GLU RESIDUE-NAME-IS-HOH RESIDUE-NAME-IS-LEU RESIDUE-NAME-IS-VAL RING-SYSTEM SECONDARY-STRUCTURE1-IS-4-HELIX SECONDARY-STRUCTURE1-IS-BEND SECONDARY-STRUCTURE1-IS-HET SECONDARY-STRUCTURE1-IS-TURN SECONDARY-STRUCTURE2-IS-BETA SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME

Predicting Ca++ Binding Sites

semantic types of attributes:

Physical Chemical Structural

e.g.,

solvent accessibilitychargeVDW volume

heteroatomoxygencarbonylASN

helixbeta-turnring-systemmobility

Coherent Model= subset of locally acceptable rules that

• explains as much of the data• uses entrenched predicates [Goodman]• uses predicates of same semantic type• uses predicates of same grain size• uses classes AND their complements• avoids rules that are "too similar": identical; subsuming; sem.close

EXAMPLE:predict Ca++ binding sites in

proteins158 rules found independently. E.g., R1: IF a site (a) is charged > 18.5 AND (b) no. of C=O > 18.75 THEN it binds calcium

R2: IF a site (a) is charged > 18.5 AND (b) no. of ASN > 15 THEN it binds calcium

Predicting Ca++ Binding Sites

semantic network of attributes

Heteroatoms

Sulfur Oxygen ... Nitrogen "Hydroxyl" Carbonyl Amide Amine | SH OH ASP GLU ASN GLN...PRO | / CYS SER THR TYR

...

...

Ca++ binding sites in proteins58 rules above threshold:

threshold = at least 80% TP AND no more than 20% FP 42 rules predict SITE 16 rules predict NON-SITE

Average accuracy for five 5-fold x-validations = 100%for the redundant model with 58 rules

Predicting Ca++ Binding SitesPrefer complementary rules -- e.g.,

R59: IF, within 5 A of a site , # oxygens > 6.5 THEN it binds calcium

R101: IF, within 5 A of a site , # oxygens <= 6.5 THEN it does NOT bind calcium

o

o

5 A Radius ModelFive perfect rules*

R1. #Oxygen LE 6.5 --> NON-SITE R2. Hydrophobicity GT -8.429 --> NON-SITE R3. #Oxygen GT 6.5 --> SITE R4. Hydrophobicity LE -8.429 --> SITE R5. #Carbonyl GT 4.5 & #Peptide LE 10.5 --> SITE*( 100% of TP's and 0 FP's )

o

Final Result Ca++ binding sites in

proteinsModel with 5 rules:

same accuracyno unique predicatesno subsumed or very similar rules more genl. rules for SITES (prior prob. < 0.01)more specific rules for NON-SITES (prior prob. > 0.99)

Predicting Ca++ Binding SitesAttribute Hierarchies

RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY)

CHARGED (ARG ASP GLU LYS)

HYDROPHOBIC (ALA ILE LEU MET PHE PRO VAL)

Attribute HierarchiesRESIDUE CLASS 2

POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY)

CHARGED ACIDIC (ARG ASP GLU)

BASIC ( LYS)

NONPOLAR (ALA ILE LEU MET PHE PRO VAL)

TRP

HIS

CONCLUSION Induction systems can be augmented with semantic criteria to provide (A) interesting & understandable rules

• syntactically simple• meaningful

(B) coherent models • equally predictive

• closer to a theory

CONCLUSION

• We have shown– how specific types of background knowledge

might be incorporated in the rule discovery process

– possible benefits of incorporating those types of knowledge

• more coherent models

• more understandable models

• more accurate models

Documents

Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh buchanan @ cs.pitt.edu josephp @ cs.pitt.edu