What to do when its all relative: How to analyse relative abundances without being misled - David Lovell

FOR FURTHER INFORMATION

Correlation is not an appropriate measure of association for relative data and can lead to contradictory conclusions if used naïvely. Figure 2 (left) shows that even thought absolute expression levels are overwhelmingly positively correlated in this experiment, relative expression levels misleadingly suggest nothing of the sort.

Correlation with purely relative data (proportions, percentages, ppm, RPKM, etc.) gives no indication of relationships within the corresponding absolute data. Furthermore, the process of “standardizing” or “normalizing” measurements by dividing through by a common randomly distributed amount can induce what Karl Pearson in 1896 called spurious correlation.

REFERENCES[1] Lovell, D., Müller, W , Taylor, J., Zwart, A., and Helliwell, C. “Proportions, Percentages, PPM: Do the Molecular Biosciences Treat Compositional Data Right?” In Compositional Data Analysis: Theory and Applications, (Vera Pawlowsky-Glahn and Antonella Buccianti, eds) 191–207. Chichester, UK: John Wiley & Sons, Ltd, 2011.

What to do when it's all relative:

MotivationCommon measurement processes in transcriptomics, proteomics, metabolomics, metagenomics and ecology produce data that carry only relative information, requiring additional—often untested [2]—assumptions to make inference about absolute abundance. Less common is awareness of why these relative data need special analysis and interpretation [1].

Our aim is to change that. To help ensure researchers do not draw the wrong conclusions from relative abundances we use yeast gene expression data to illustrate the issues (Figure 1).

In molecular bioscience, measurements of relative abundance are, well… abundant.Appreciation that they need special analysis and interpretation is scarce.Correlation is often used as a measure of association between different relative biomolecular abundances. This is wrong and potentially misleading except when total abundance is fixed across conditions.We present an alternative that is right for relative abundances.

How to analyse relative abundances without being misledDavid R Lovell (CSIRO), Vera Pawlowsky-Glahn (University of Girona) and Juan José Egozcue (Universitat Politècnica de Cataluña)

CSIRO COMPUTATIONAL INFORMATICS

David LovellCSIRO/Australian Bioinformatics Networke [email protected] www.csiro.au/people/David.Lovell

[2] Lovén, J, et al. 2012. “Revisiting Global Gene Expression Analysis.” Cell 151 (3) (October 26): 476–482. [3] Marguerat, S, et al. 2012. “Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells.” Cell 151 (3) (October 26): 671–683.

Figure 1: (Left) Absolute and (Right) relative expression levels of 3031 yeast mRNAs over a 16-point time course experiment in which cells were deprived of nutrients [3]. The red and blue pairs of mRNAs are used in Figure 2. Note that the absolute abundances show that production of different mRNAs is generally positively correlated (i.e., expression levels change in the same direction) in this experiment. Note also that mRNA abundance spans six orders of magnitude.

[4] D. Lovell, V. Pawlowsky-Glahn, and J. J. Egozcue. “Have You Got Things in Proportion? A Practical Strategy for Exploring Association in High-dimensional Compositions.” In Proceedings of the 5th International Workshop on Compositional Data Analysis, edited by K Hron, Peter Filzmoser, and M Templ, 100–110. Vorau, Austria, 2013. http://www.codawork2013.com

Proportionality to the rescueIf the relative abundances of two different molecules stay in a fixed proportion to one another across different experimental conditions, then their absolute abundances behave proportionally also:

xi/ti yi/ti implies that xi yi

where xi and yi are the absolute abundances of the molecules,and ti the total abundance at condition i.

All we need now is a measure of how close to proportionality is the behaviour of two amounts.

Figure 2: (Left) Histograms of correlation coefficients calculated (appropriately) from the absolute data (x-axis) and (inappropriately) from the relative data (y-axis). The blue and the red points correspond to the blue and red pairs of mRNA in Figure 1 illustrating that correlating relative abundances can lead us draw the opposite conclusion about the relationship between variables(Right) Histograms of correlation coefficients calculated (appropriately) from the absolute data (x-axis) and the φ() values from the relative data (y-axis). The red rectangle highlights the fact that while the absolute abundances of the vast majority of mRNAs are strongly positively correlated, only a very few behave proportionally.

φ: a measure of “goodness of fit to proportionality”We have shown [4] that

φ(log x, log y) = 1 + β2 – 2β|r|

tells us how proportional x and y are. In this equation

β is the slope of the Standardised Major Axis of log x, log yr is the correlation of log x, log y.

As x and y behave more proportionally, the slope of their logarithms approaches 1, as does their correlation, and φ(log x, log y) approaches 0.

Use φ on all your relatives…Now you can analyse the relationships between relative abundances with confidence using φ() instead of correlation. Figure 3 shows how it helps select strongly proportional mRNA pairs from the yeast data. Figure 4 shows how it can be used as the basis of familiar analyses and visualisations, including graphs, heatmaps and hierarchical clustering.

Whether you have only relative abundance data, or whether you wish to explore relative relationships in absolute abundances:

Don’t be misled: use φ

Figure 3: (Left) β and r2 values of a subset of the mRNA pairs that are strongly proportional, coloured by their φ() values. (Right) Absolute expression levels of the 424 pairs of mRNAs with φ(clr(xi); clr(xj)) < 0.05 plotted on a natural scale.

Figure 4: φ() can be used in place of correlation as the basis of many familiar analyses and visualisations. (Right) a subset of mRNAs clustered using φ() as a distance metric(Below) mRNAs with φ(clr(xi); clr(xj)) < 0.05 visualised as a graph with edges between strongly proportional mRNAs

Technology

What to do when its all relative: How to analyse relative abundances without being misled - David Lovell