Upload
shona-bruce
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Differential Analysis& FDR Correction
Correlation Analysis Steps
Step 1: Construction of input data table in EXCELStep 2: Save EXCEL file into tab delimited txt fileStep 3: Upload data - tab delimited txt fileStep 4: Choose correlation algorithm Step 5: Enter your email and submitStep 6: Result interpretation: global FDRStep 7: Result interpretation: local FDR
Step 1:
Sample Clinical parameter Gene Gene Gene
patient.1.name … … … …
patient.2.name … … … …
… … … … …
… … … … …
Construction of input data table in EXCEL
Step 1:
Input data format:• Cell A1: “sample”• 1st Column: patient names or IDs• 1st Row: .
• Cell A2: clinical parameter• Cell A3 & others: gene name
• 2nd column: values of one clinical parameter• All other cells should be molecular data,
• one sample/patient per row• e.g. array intensity or protein quantities
EXCEL file example
Step 2: Save EXCEL file into tab delimited txt file
Step 3: Upload data - tab delimited txt file
1 2
3
Step 3: Upload data - tab delimited txt file
Input data “input.cor.txt” selected
Step 4: Choose algorithm for correlation analysis
Choose correlation algorithm
Step 4: which one to choose?
• Rank based correlation – study relationship between different rankings on the same set of items
• During the analysis, raw scores are converted to rankings
• Spearman• Kendall
• Pearson product-moment correlation coefficient
To correlate a clinical variable to molecular data:
Spearman’s rank, Kendall tau, or Pearson product-moment correlation coefficient analysis?
Spearman’s Rank Correlation Coefficient Analysis:
Spearman rank correlation is used when you have two measurement variables and one “hidden” nominal variable, which groups the measurements into pairs. It is a non-parametric test for correlation and used when one or both of the variables consists of ranks.
Kendall Tau Correlation Coefficient Analysis:Kendall's Tau Correlation Coefficient analysis is a measure of correlation and
measures the strength of the relationship between two variables. It provides a distribution free test of independence and a measure of the strength of dependence between two variables. It is required two variables, X and Y, that are paired observations. Both variables that are provided should be at least ordinal.
Pearson Product-Moment Correlation Coefficient Analysis:
The Pearson product-moment correlation coefficient is a common measure of the correlation (linear dependence) between two variables X and Y. It is very widely used in the sciences as a measure of the strength of linear dependence between two variables, giving a value somewhere between +1 and -1 inclusive.
Step 5: Enter your email and submit
Enter your email
Submit
Step 6: Result interpretationGlobal FDR
Single hypothesis test
= correlation between one gene and one clinical variable
Step 6: Result interpretationGlobal FDR
FDR plot red line: Total Discoveries (TD) or Total Discovery rate = 1
FDR plot green line: False Discoveries (MEAN) or False Discovery Rate FDR (MEAN)
FDR plot black bar line: False Discoveries (MEDIAN) or False Discovery Rate FDR (MEDIAN)
FDR plot blue line: False Discoveries (95%) or False Discovery Rate FDR (95%)
FDR plot dotted black line: FDR=0.05
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
Step 6: How to read the gFDR plots
• Commonly used global FDR cut off • 0.05
• If there are no significant features• No data points will show up below
the 0.05 dotted horizontal line
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
Features which satisfy global FDR < 0.05
Commonly used gFDR cutoff: 0.05
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDAG
lob
al
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Features which satisfy global FDR < 0.05
Commonly used gFDR cutoff: 0.05
Step 7: Result interpretationlocal FDR
Lo
cal F
DR
Single hypothesis test P-value
0.0 0.01 0.02 0.03 0.04 0.050.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
50
.10
0.1
50
.20
Step 7: How to read the lFDR plots
It has been suggested (Aubert, et al., 2004) that the first abrupt change of the local FDR can be an indication for the determination of a good threshold to choose genuinely statistically significant features.
Step 7: Result interpretationlocal FDR
Lo
cal F
DR
Single hypothesis test P-value
0.0 0.01 0.02 0.03 0.04 0.050.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
50
.10
0.1
50
.20
1st abrupt change of lFDR
Step 7: Result interpretationlocal FDR
Click to download result file
Step 7: Result interpretationlocal FDR
Local FDR results:• 1st column: feature name
• 2nd column: correlation test P value
• 3rd column: local FDR results