29
Some Two-Block Problems Douglas M. Hawkins NCSU ECCR NISS Feb 2007 (work with Despina Stefan.)

Some Two-Block Problems

  • Upload
    jake

  • View
    37

  • Download
    1

Embed Size (px)

DESCRIPTION

Some Two-Block Problems. Douglas M. Hawkins NCSU ECCR NISS Feb 2007 (work with Despina Stefan.). Disclaimer. No new results or material coming up. I will present some things that are known, and some useful-looking extensions. Extensions that look worth pursuing are for another day. - PowerPoint PPT Presentation

Citation preview

Page 1: Some Two-Block Problems

Some Two-Block Problems

Douglas M. Hawkins

NCSU ECCR NISS Feb 2007

(work with Despina Stefan.)

Page 2: Some Two-Block Problems

Disclaimer

No new results or material coming up. I will present some things that are known,

and some useful-looking extensions. Extensions that look worth pursuing are for

another day.

Page 3: Some Two-Block Problems

Two Conceptual Settings

Usual QSAR has One dependent variable (‘activity’) A vector of predictors (‘structure’)

and seeks a model connecting the two.

Variants include: A vector of dependent variables, and/or Predictors break logically into blocks

Page 4: Some Two-Block Problems

Example – first type

In drug discovery, concern is with efficacy (one measure); also safety (many measures.)

Safety endpoints constitute a vector of dependents to relate to the vector of predictors.

Commonly we handle safety by collection of QSAR models predicting individual AEs. But other approaches are possible

Page 5: Some Two-Block Problems

Example – second type

Or we may have a single dependent, and predictors may break into blocks. eg

Molecular structure variables, Microarray measures, Proteomic measures, Ames test toxicity.

Page 6: Some Two-Block Problems

First type in detail

In first setting, we have m - component vector Y of dependents, p - component vector X of predictors.

that we seek to relate.

Classical tool is canonical correlation analysis

Page 7: Some Two-Block Problems

Canonical Correlation

Consider classical setting – psychometrics:

X and Y are scores on two batteries of tests thought to measure innate ability.

Seek a common linking subspace. Find coefficient vectors a and b such that

aTX and bTY

are maximally correlated.

Page 8: Some Two-Block Problems

Canonical continued

Idea is that aTX, bTY capture a latent dimension conceptually like a factor analysis factor.

Having found maximizing pair a, b, go off at right angles and get another orthogonal maximizing pair. Do so repeatedly.

Finding k such “significant” coefficient vector pairs points to the data containing k dimensions in which X, Y co-vary. So CC is a dimension reduction method (DRM)

Page 9: Some Two-Block Problems

How do we fit CC?

Least-squares criterion leads to a closed-form eigenvalue problem.

Another potential approach: use alternating fit algorithm: Get trial b. Regress bTY on X to get a trial a. Regress aTX on Y to get a new trial b.

Iterate to convergence

Page 10: Some Two-Block Problems

Algorithm continued

This gives first coefficient vector pair. Deflate both X and Y. Start all over and get second coefficient pair. Continue until you have ‘enough’ dimensions. Hideously inefficient calculation compared to

eigen approach.

Page 11: Some Two-Block Problems

What about outliers?

As usual, LS susceptible to outlier problems and so CC is also.

Alternating optimization algorithm allows choice of other outlier-resistant criteria. For example use L1 criterion, or trimmed least squares to get a robust CC.

I don’t know anyone who has tried this idea, but it is straightforward to do.

Page 12: Some Two-Block Problems

Non-negative CC

Alternating optimization provides route to non-negative canonical correlation (NNCC).

Fit alternating regressions, as in sketch. But restrict coefficients to be non-negative

using standard inequality-constrained regression methods.

This leads to NNCC.

Page 13: Some Two-Block Problems

Robust NNCC

When fitting the alternating regressions, use outlier-resistant criterion.

For example L1 norm. Marriage of L1 norm, non-negative

coefficients leads to a linear program. This may prove to be surprisingly reasonable computationally.

Page 14: Some Two-Block Problems

And while we are at it…

If we use L1 criterion, and non-negative coefficients, we can also impose an L1 penalty on coefficient vector.

This leads to a linear programming problem. Koenker/Portnoy paper suggests this can be solved in time competitive with L2 regression.

L1 penalty on coefficient vector, the LASSO, is known to be a route to automatic sparsity.

Page 15: Some Two-Block Problems

Detour – Ridge and LASSO

In regression, penalizing L2 norm of coefficient vector gives ridge regression; L1 gives the LASSO.

LASSO gives sparse coefficients; ridge does not. Given a set of “equivalent” predictors LASSO keeps one and drops the rest; ridge smoothes all their coefficients toward a common consensus figure.

Page 16: Some Two-Block Problems

CC is not widely used.

CC unhelpful in safety studies; we care about incidence of headaches and of diarrhea, not about 0.7*headache-0.5*diarrhea

But CC can be a valuable screen. Variables with “large” loadings apparently relate in some way to variables on the other side. Converse though is not true.

Extended robust and/or NN versions could be valuable tools.

Page 17: Some Two-Block Problems

PLS

PLS is also able to handle relating a vector Y and a vector X.

Computation is a lot faster than CC. But also has an underlying LS criterion, so

you are still at mercy of outliers, and also gives you linear combinations of

variables – not easy to interpret.

Page 18: Some Two-Block Problems

Second Setting

Suppose we have predictors that divide into natural blocks X1, X2, … Xk.

Obvious analysis method adjoins all predictors, fits QSAR in the usual way - nothing new.

Page 19: Some Two-Block Problems

Predictor Blocks

Or can form subsets of blocks (2k-1 possible) and fit QSAR on each subset of blocks. Use measures of additional information to see how much each block adds to predecessors. Helpful to know if microarray adds usefully to atom pairs.

Again, nothing earth-shattering. Exhaustive enumeration of blocks thinkable as typically have only a few blocks.

Page 20: Some Two-Block Problems

Different Way of Thinking

Return to CC.

Was not wonderfully helpful as modeling tool.

But might be successful as a DRM.

Page 21: Some Two-Block Problems

A DRM Model

Suppose there are ‘a few’ latent dimensions. These dimensions drive Y, and Xk blocks.

Maybe we can recover latent dimensions from the X, and use these to predict Y.

Potential for huge reduction in standard errors of components if the model holds.

Principal component regression (PCR) is a special case of this, got when we have only one block.

Page 22: Some Two-Block Problems

Example

With two blocks of predictors, X1, X2: Do a CC of the two blocks. Use these apparently-common dimensions

as predictors of Y.

Page 23: Some Two-Block Problems

Is this like a PCA of adjoined X?

In principle, no. Getting ‘under the hood’ of eigensolution to CC, step 1 is ‘Multistandardize’: transform X to W=EX,

transform Y to V=FY where elements of W are uncorrelated and elements of V are uncorrelated.

Do SVD of cross-covariance matrix of W and V. Multistandardization step flattens out principal

components of both X and Y.

Page 24: Some Two-Block Problems

which means….

To come out of the CC as an important latent dimension, covarying within either X or Y is not enough – the dimension needs to be common between the two blocks.

Thus CC of the two blocks is, in principle, a different DRM approach.

Page 25: Some Two-Block Problems

Three or more blocks

CC covers two predictor blocks. There are several ways to generalize to three or more blocks.

Recent U of MN PhD thesis by Despina Stefan discussed a number of them.

In it, she looked at generalized CC as a DRM method for use in QSAR.

Page 26: Some Two-Block Problems

Does it work?

She simulated a setting with 3 latent dimensions that determined both the blocks of X and the dependent Y.

Doing this DRM on the predictor blocks and regressing on the constructed variables was highly effective when there was appreciable noise in the relationships from the latent dimensions to the X and Y.

Page 27: Some Two-Block Problems

Real-data results

Limited testing on real data sets to date. Results have been OK, but not earth-shattering. We await the setting where there really are a few underlying latent dimensions.

Page 28: Some Two-Block Problems

And non-negative?

These results were in sign-unconstrained setting. It is reasonable to expect them to carry over to non-negative equivalents. NN variants of the multi-block approach as a DRM should be straightforward and potentially powerful QSAR tools.

Page 29: Some Two-Block Problems

Wrapup

The first setting, vector Y, is familiar from the early days of psychometrics. Robust and/or NN variants seem ripe for picking.

Second setting, multiple predictor blocks, is gaining relevance. Robust and/or NN variants seem straightforward to develop.

Work on unrestricted formulations indicates potential for specialized DRM approaches; this should carry over.