Machine Learning CUNY Graduate Center Lecture 5: Graphical Models

Machine Learning

CUNY Graduate Center

Lecture 5: Graphical Models

• Logistic Regression– Maximum Entropy Formulation

• Decision Trees Redux– Now using Information Theory

• Graphical Models– Representing conditional dependence

graphically

Logistic Regression Optimization

• Take the gradient in terms of w

Optimization

• We know the gradient of the error function, but how do we find the maximum value?

• Setting to zero is nontrivial

• Numerical approximation

Entropy

• Measure of uncertainty, or Measure of “Information”

• High uncertainty equals high entropy.

• Rare events are more “informative” than common events.

Examples of Entropy

• Uniform distributions have higher distributions.

Maximum Entropy

• Logistic Regression is also known as Maximum Entropy.

• Entropy is convex.– Convergence Expectation.

• Constrain this optimization to enforce good classification.

• Increase maximum likelihood of the data while making the distribution of weights most even.– Include as many useful features as possible.

Maximum Entropy with Constraints

• From Klein and Manning Tutorial8

Optimization formulation

• If we let the weights represent likelihoods of value for each feature.

For each feature i

Solving MaxEnt formulation

• Convex optimization with a concave objective function and linear constraints.

• Lagrange Multipliers

For each feature iDual representation of the

maximum likelihood estimation of Logistic Regression

Dual representation of the maximum likelihood estimation of

Logistic Regression

Decision Trees

• Nested ‘if’-statements for classification

• Each Decision Tree Node contains a feature and a split point.

• Challenges:– Determine which feature and split point to use– Determine which branches are worth

including at all (Pruning)

Decision Trees

colorcolor

hh ww ww

ww ww hh hh

blue brown green

<66<140 <150

<66 <64<145 <170

mm mmff ff

Ranking Branches

• Last time, we used classification accuracy to measure value of a branch.

heightheight

1M / 5F 5M / 1F

50% Accuracy before Branch

83.3% Accuracy after Branch

33.3% Accuracy Improvement

6M / 6F

Ranking Branches

• Measure Decrease in Entropy of the class distribution following the split

heightheight

1M / 5F 5M / 1F

H(x) = 2 before Branch

83.3% Accuracy after Branch

33.3% Accuracy Improvement

6M / 6F

InfoGain Criterion

• Calculate the decrease in Entropy across a split point.

• This represents the amount of information contained in the split.

• This is relatively indifferent to the position on the decision tree. – More applicable to N-way classification.– Accuracy represents the mode of the distribution– Entropy can be reduced while leaving the mode

unaffected.

Graphical Models and Conditional Independence

• More generally about probabilities, but used in classification and clustering.

• Both Linear Regression and Logistic Regression use probabilistic models.

• Graphical Models allow us to structure, and visualize probabilistic models, and the relationships between variables.

(Joint) Probability Tables

• Represent multinomial joint probabilities between K variables as K-dimensional tables

• Assuming D binary variables, how big is this table?

• What is we had multinomials with M entries?

Probability Models

• What if the variables are independent?

• If x and y are independent:

• The original distribution can be factored

• How big is this table, if each variable is binary?

Conditional Independence

• Independence assumptions are convenient (Naïve Bayes), but rarely true.

• More often some groups of variables are dependent, but others are independent.

• Still others are conditionally independent.

Conditional Independence

• If two variables are conditionally independent.

• E.g. y = flu?, x = achiness?, z = headache?

Factorization if a joint

• Assume

• How do you factorize:

Factorization if a joint

• What if there is no conditional independence?

• How do you factorize:

Structure of Graphical Models

• Graphical models allow us to represent dependence relationships between variables visually– Graphical models are directed acyclic graphs

(DAG).– Nodes: random variables– Edges: Dependence relationship– No Edge: Independent variables– Direction of the edge: indicates a parent-child

relationship– Parent: Source – Trigger– Child: Destination – Response

Example Graphical Models

• Parents of a node i are denoted πi

• Factorization of the joint in a graphical model:

xx yy xx yy

Basic Graphical Models

• Independent Variables

• Observations

• When we observe a variable, (fix its value from data) we color the node grey.

• Observing a variable allows us to condition on it. E.g. p(x,z|y)

• Given an observation we can generate pdfs for the other variables.

xx yy zz

• X = cloudy?

• Y = raining?

• Z = wet ground?

• Markov Chain

xx yy zz

• Markov Chain

• Are x and z conditionally independent given y?

xx yy zz

• Markov Chain

xx yy zz

One Trigger Two Responses

• X = achiness?

• Y = flu?

• Z = fever?

Two Triggers One Response

• X = rain?

• Y = wet sidewalk?

• Z = spilled coffee?

Factorization

x2x2 x4x4

Factorization

x2x2 x4x4

How Large are the probability tables?

Model Parameters as Nodes

• Treating model parameters as a random variable, we can include these in a graphical model

• Multivariate Bernouli

µ0µ0

µ1µ1

µ2µ2

Model Parameters as Nodes

• Treating model parameters as a random variable, we can include these in a graphical model

• Multinomial

x1x1 x2x2

Naïve Bayes Classification

• Observed variables xi are independent given the class variable y

• The distribution can be optimized using maximum likelihood on each variable separately.

• Can easily combine various types of distributions

x1x1 x2x2

Graphical Models

• Graphical representation of dependency relationships

• Directed Acyclic Graphs• Nodes as random variables• Edges define dependency relations• What can we do with Graphical Models

– Learn parameters – to fit data– Understand independence relationships between

variables– Perform inference (marginals and conditionals)– Compute likelihoods for classification.

Plate Notation

• To indicate a repeated variable, draw a plate around it.

x1x1 xnxn…

Completely observed Graphical Model

• Observations for every node

• Simplest (least general) graph, assume each independent

Completely observed Graphical Model

• Observations for every node

• Second simplest graph, assume complete dependence

Maximum Likelihood

• Each node has a conditional probability table, θ

• Given the tables, we can construct the pdf.

• Use Maximum Likelihood to find the best settings of θ

Maximum likelihood

Count functions

• Count the number of times something appears in the data

Maximum Likelihood

• Define a function:

• Constraint:

Maximum Likelihood

• Use Lagrange Multipliers

Maximum A Posteriori Training

• Bayesians would never do that, the thetas need a prior.

Conditional Dependence Test• Can check conditional independence in a graphical model

– “Is achiness (x3) independent of the flue (x0) given fever(x1)?”– “Is achiness (x3) independent of sinus infections(x2) given

fever(x1)?”

D-Separation and Bayes Ball

• Intuition: nodes are separated or blocked by sets of nodes.– E.g. nodes x1 and x2, “block” the path from x0

to x5. So x0 is cond. ind.from x5 given x1 and x2

Bayes Ball Algorithm

• Shade nodes xc

• Place a “ball” at each node in xa

• Bounce balls around the graph according to rules

• If no balls reach xb, then cond. ind.

Ten rules of Bayes Ball Theorem

Bayes Ball Example

Undirected Graphs

• What if we allow undirected graphs?• What do they correspond to?• Not Cause/Effect, or Trigger/Response,

but general dependence• Example: Image pixels, each pixel is a

bernouli– P(x11,…, x1M,…, xM1,…, xMM)– Bright pixels have bright neighbors

• No parents, just probabilities.• Grid models are called Markov

Random Fields

Undirected Graphs

• Undirected separability is easy.• To check conditional independence of A and

B given C, check the Graph reachability of A and B without going through nodes in C

Next Time

• More fun with Graphical Models

• Read Chapter 8.1, 8.2

Machine Learning CUNY Graduate Center Lecture 5: Graphical Models

Documents

What Is a Dissertation? - redmine.gc.cuny.edu · CUNY) Amanda Licastro, English, Graduate Center, CUNY Nick Sousanis, Teachers College, Columbia University ˜ Host a Local Event,

NYC Quarterly Labor Market Brief - Graduate Center, CUNY · Q2 2018 . This brief was prepared for the New York City Workforce Funders by NYCLMIS • CUNY Graduate Center • New York,

Eva Fernández & Dianne Bradley Queens College & Graduate Center CUNY in collaboration with

CUNY WORKFORCE DEMOGRAPHICS BY COLLEGE, …...of Professional Studies, CUNY Graduate School of Journalism, CUNY Graduate School of Public Health, and William E. Macaulay Honors College

Working Papers WP 20-21 - Graduate Center, CUNY

Graduate Center, CUNY › CUNY_GC › media › CUNY-Graduate-Cente… · Republican villa at Boscoreale; these works, which exalted Hellenistic monarchs in a complex symbolic visual

PH.D. IN FRENCH, CUNY GRADUATE CENTER READING ......PH.D. IN FRENCH, CUNY GRADUATE CENTER READING LIST - LITTÉRATURE D'EXPRESSION FRANÇISE SECOND EXAMINATION - WRITTEN 2 LITTÉRATURE

Lets talk about Tax CUNY Graduate School 2018€¦ · Lets talk about Tax. CUNY Graduate School 2018. I. Overview of tax for non-resident students and scholars II. Student tax behavior

CALENDAR SPRING 2014 - CUNY Graduate Center

Reisman FairPay Overview at CUNY Graduate School of Journalism 9-20-16

Do Students Know Best? Choice, Classroom Time …...Onur Altindag Graduate Center, CUNY Stephen D. O’Connell Graduate Center, CUNY Dahlia Remler Baruch College and Graduate Center,

Learning Parse?l - The Graduate Center, CUNY

Graduate Center, CUNY

Scanned using Book ScanCenter 5022 - Graduate Center, CUNY

Finance 2008-09 Institution: CUNY Graduate School and ... · Institution: CUNY Graduate School and University Center (190576) User ID: 36c0021 Finance - Public institutions Form Version

A project of the Graduate Center, CUNY Math Unit 1bridges-sifeproject.com/Course_Materials/07_MATH/MATH_U01/MAT… · A project of the Graduate Center, CUNY ... Math . Unit 1 . Describing

Introducing CUNY Academic Works (Graduate Center Edition)

CPH study tips and tricks - Homepage - CUNY Graduate

SecondExam slides Dezhong - Graduate Center, CUNY Presentations... · An Overview of Transliteration Dezhong Deng the Graduate Center, CUNY V I C T O R I A dengdezhong816@gmail.com

Language Production Eva M. Fernández Queens College & Graduate Center CUNY ABRALIN21-FEB-05