Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Lecture 2: Statistical learning primer for biologists

Alan QiPurdue Statistics and CS

Jan. 15, 2009

Outline

• Basics for probability• Regression• Graphical models: Bayesian networks and

Markov random fields• Unsupervised learning: K-means and

Expectation maximization

Probability Theory

•Sum Rule

Product Rule

The Rules of Probability

• Sum Rule

• Product Rule

Bayes’ Theorem

posterior likelihood × prior

Probability Density & Cumulative Distribution Functions

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Variances and Covariances

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Gaussian Parameter Estimation

Likelihood function

Maximum (Log) Likelihood

Properties of and

Unbiased

Biased

Curve Fitting Re-visited

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Predictive Distribution

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Bayesian Curve Fitting

Bayesian Networks

• Directed Acyclic Graph (DAG)

Bayesian Networks

General Factorization

Generative Models

• Causal process for generating images

Discrete Variables (1)

• General joint distribution: K 2 -1 parameters

• Independent joint distribution: 2(K-1) parameters

Discrete Variables (2)

General joint distribution over M variables: KM -1 parameters

M node Markov chain: K-1+(M-1)K(K-1) parameters

Discrete Variables: Bayesian Parameters (1)

Discrete Variables: Bayesian Parameters (2)

Shared prior

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(K M) parameters.

The parameterized form

requires only M + 1 parameters

Conditional Independence

• a is independent of b given c

• Equivalently

• Notation

Conditional Independence: Example 1

• Note: this is the opposite of Example 1, with c unobserved.

Note: this is the opposite of Example 1, with c observed.

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading

(0=empty, 1=full)

And hence

Probability of an empty tank increased by observing G = 0.

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.

Cliques and Maximal Cliques

Clique

Maximal Clique

Joint Distribution

• where is the potential over clique C and

• is the normalization coefficient; note: M K-state variables KM terms in Z.

• Energies and the Boltzmann distribution

Illustration: Image De-Noising (1)

Original Image Noisy Image

Noisy Image Restored Image (ICM)

Converting Directed to Undirected Graphs (1)

Converting Directed to Undirected Graphs (2)

• Additional links: “marrying parents”, i.e., moralization

Directed vs. Undirected Graphs (2)

Inference on a Chain

Computational time increases exponentially with N.

Inference on a Chain

Supervised Learning

• Supervised learning: learning with examples or labels, e.g., classification and regression

• Linear regression (the example we just given), Generalized linear models (e.g, probit classification), Support vector machines, Gaussian processes classifications, etc.

• Take CS590M-Machine Learning in 2009 fall.

Unsupervised Learning

• Supervised learning: learning with examples or labels, e.g., classification and regression

• Unsupervised learning: learning without examples or labels, e.g., clustering, mixture models, PCA, non-negative matrix factorization

K-means Clustering: Goal

Cost Function

Two Stage Updates

Optimizing Cluster Assignment

Optimizing Cluster Centers

Convergence of Iterative Updates

Example of K-Means Clustering

Mixture of Gaussians• Mixture of Gaussians:

• Introduce latent variables:

• Marginal distribution:

Conditional Probability

• Responsibility that component k takes for explaining the observation.

Maximum Likelihood

• Maximize the log likelihood function

Maximum Likelihood Conditions (1)

• Setting the derivatives of to zero:

• Setting the derivative of to zero:

• Lagrange function:

• Setting its derivative to zero and use the normalization constraint, we obtain:

Expectation Maximization for Mixture Gaussians

• Although the previous conditions do not provide closed-form conditions, we can use them to construct iterative updates:

• E step: Compute responsibilities .• M step: Compute new mean , variance ,

and mixing coefficients .• Loop over E and M steps until the log

likelihood stops to increase.

Example

• EM on the Old Faithful data set.

General EM Algorithm

EM as Lower Bounding Methods

• Goal: maximize

• Define:

• We have

Lower Bound

• is a functional of the distribution .

• Since and ,• is a lower bound of the log likelihood

function .

Illustration of Lower Bound

Lower Bound Perspective of EM

• Expectation Step:• Maximizing the functional lower bound

over the distribution .

• Maximization Step:• Maximizing the lower bound over the

parameters .

Illustration of EM Updates

Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Documents

Linux for Biologists Part 1

Bioinformatics for Human Biologists

Qi Qi Har Wind Farm

R tutorial - Purdue UniversityUpdated by. Hilda Ibriga, Jincheng Bai and Qi Wang ... Originally created by . Hilda Ibriga, Linna Henry, Patricia Wahyu Haumahu, Qi Wang, Yixuan Qiu

Statistical advices for biologists

CS 59000 Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept. 25 2008

CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Biology 14.2 How Biologists Classify Organisms How Biologists Classify Organisms

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang

American Society of Plant Biologists

Computer Programming for Biologists

SIMPLE DATA ANALYSIS FOR BIOLOGISTS

Perl Scripting for Biologists -

1 1°ORDINE: risposta ad altri segnali semplici RAMPA qi=qi=qi=qi= q o =0se t

Biologists’ Tools and Technology

Qi Testing - Eurofins Scientific · Qi Testing Qi Interoperability Pre-Test Service What is Qi? The Benefits of Qi Testing ... promote widespread market adoption of Qi technology

Notable biologists

1.4 Biologists’ Tools and Technology KEY CONCEPT Technology continually changes the way biologists work

Chin Qi Qi Presentation

Perl for Biologists