Some Two-Block Problems

Some Two-Block Problems

Douglas M. Hawkins

NCSU ECCR NISS Feb 2007

(work with Despina Stefan.)

Disclaimer

No new results or material coming up. I will present some things that are known,

and some useful-looking extensions. Extensions that look worth pursuing are for

another day.

Two Conceptual Settings

Usual QSAR has One dependent variable (‘activity’) A vector of predictors (‘structure’)

and seeks a model connecting the two.

Variants include: A vector of dependent variables, and/or Predictors break logically into blocks

Example – first type

In drug discovery, concern is with efficacy (one measure); also safety (many measures.)

Safety endpoints constitute a vector of dependents to relate to the vector of predictors.

Commonly we handle safety by collection of QSAR models predicting individual AEs. But other approaches are possible

Example – second type

Or we may have a single dependent, and predictors may break into blocks. eg

Molecular structure variables, Microarray measures, Proteomic measures, Ames test toxicity.

First type in detail

In first setting, we have m - component vector Y of dependents, p - component vector X of predictors.

that we seek to relate.

Classical tool is canonical correlation analysis

Canonical Correlation

Consider classical setting – psychometrics:

X and Y are scores on two batteries of tests thought to measure innate ability.

Seek a common linking subspace. Find coefficient vectors a and b such that

aTX and bTY

are maximally correlated.

Canonical continued

Idea is that aTX, bTY capture a latent dimension conceptually like a factor analysis factor.

Having found maximizing pair a, b, go off at right angles and get another orthogonal maximizing pair. Do so repeatedly.

Finding k such “significant” coefficient vector pairs points to the data containing k dimensions in which X, Y co-vary. So CC is a dimension reduction method (DRM)

How do we fit CC?

Least-squares criterion leads to a closed-form eigenvalue problem.

Another potential approach: use alternating fit algorithm: Get trial b. Regress bTY on X to get a trial a. Regress aTX on Y to get a new trial b.

Iterate to convergence

Algorithm continued

This gives first coefficient vector pair. Deflate both X and Y. Start all over and get second coefficient pair. Continue until you have ‘enough’ dimensions. Hideously inefficient calculation compared to

eigen approach.

What about outliers?

As usual, LS susceptible to outlier problems and so CC is also.

Alternating optimization algorithm allows choice of other outlier-resistant criteria. For example use L1 criterion, or trimmed least squares to get a robust CC.

I don’t know anyone who has tried this idea, but it is straightforward to do.

Non-negative CC

Alternating optimization provides route to non-negative canonical correlation (NNCC).

Fit alternating regressions, as in sketch. But restrict coefficients to be non-negative

using standard inequality-constrained regression methods.

This leads to NNCC.

Robust NNCC

When fitting the alternating regressions, use outlier-resistant criterion.

For example L1 norm. Marriage of L1 norm, non-negative

coefficients leads to a linear program. This may prove to be surprisingly reasonable computationally.

And while we are at it…

If we use L1 criterion, and non-negative coefficients, we can also impose an L1 penalty on coefficient vector.

This leads to a linear programming problem. Koenker/Portnoy paper suggests this can be solved in time competitive with L2 regression.

L1 penalty on coefficient vector, the LASSO, is known to be a route to automatic sparsity.

Detour – Ridge and LASSO

In regression, penalizing L2 norm of coefficient vector gives ridge regression; L1 gives the LASSO.

LASSO gives sparse coefficients; ridge does not. Given a set of “equivalent” predictors LASSO keeps one and drops the rest; ridge smoothes all their coefficients toward a common consensus figure.

CC is not widely used.

CC unhelpful in safety studies; we care about incidence of headaches and of diarrhea, not about 0.7*headache-0.5*diarrhea

But CC can be a valuable screen. Variables with “large” loadings apparently relate in some way to variables on the other side. Converse though is not true.

Extended robust and/or NN versions could be valuable tools.

PLS

PLS is also able to handle relating a vector Y and a vector X.

Computation is a lot faster than CC. But also has an underlying LS criterion, so

you are still at mercy of outliers, and also gives you linear combinations of

variables – not easy to interpret.

Second Setting

Suppose we have predictors that divide into natural blocks X1, X2, … Xk.

Obvious analysis method adjoins all predictors, fits QSAR in the usual way - nothing new.

Predictor Blocks

Or can form subsets of blocks (2k-1 possible) and fit QSAR on each subset of blocks. Use measures of additional information to see how much each block adds to predecessors. Helpful to know if microarray adds usefully to atom pairs.

Again, nothing earth-shattering. Exhaustive enumeration of blocks thinkable as typically have only a few blocks.

Different Way of Thinking

Return to CC.

Was not wonderfully helpful as modeling tool.

But might be successful as a DRM.

A DRM Model

Suppose there are ‘a few’ latent dimensions. These dimensions drive Y, and Xk blocks.

Maybe we can recover latent dimensions from the X, and use these to predict Y.

Potential for huge reduction in standard errors of components if the model holds.

Principal component regression (PCR) is a special case of this, got when we have only one block.

Example

With two blocks of predictors, X1, X2: Do a CC of the two blocks. Use these apparently-common dimensions

as predictors of Y.

Is this like a PCA of adjoined X?

In principle, no. Getting ‘under the hood’ of eigensolution to CC, step 1 is ‘Multistandardize’: transform X to W=EX,

transform Y to V=FY where elements of W are uncorrelated and elements of V are uncorrelated.

Do SVD of cross-covariance matrix of W and V. Multistandardization step flattens out principal

components of both X and Y.

which means….

To come out of the CC as an important latent dimension, covarying within either X or Y is not enough – the dimension needs to be common between the two blocks.

Thus CC of the two blocks is, in principle, a different DRM approach.

Three or more blocks

CC covers two predictor blocks. There are several ways to generalize to three or more blocks.

Recent U of MN PhD thesis by Despina Stefan discussed a number of them.

In it, she looked at generalized CC as a DRM method for use in QSAR.

Does it work?

She simulated a setting with 3 latent dimensions that determined both the blocks of X and the dependent Y.

Doing this DRM on the predictor blocks and regressing on the constructed variables was highly effective when there was appreciable noise in the relationships from the latent dimensions to the X and Y.

Real-data results

Limited testing on real data sets to date. Results have been OK, but not earth-shattering. We await the setting where there really are a few underlying latent dimensions.

And non-negative?

These results were in sign-unconstrained setting. It is reasonable to expect them to carry over to non-negative equivalents. NN variants of the multi-block approach as a DRM should be straightforward and potentially powerful QSAR tools.

Wrapup

The first setting, vector Y, is familiar from the early days of psychometrics. Robust and/or NN variants seem ripe for picking.

Second setting, multiple predictor blocks, is gaining relevance. Robust and/or NN variants seem straightforward to develop.

Work on unrestricted formulations indicates potential for specialized DRM approaches; this should carry over.

Documents

Some Two-Block Problems