52
Differential Gene Expression: Ischemic vs. Nonischemic Jing Hu Dongmei Li Shuyan Wan Richard Yamada Jeong-Me Yoon Zailong Wang (Mentor)

Differential Gene Expression: Ischemic vs. Nonischemic

Embed Size (px)

Citation preview

  • 1. Differential Gene Expression: Ischemic vs. Nonischemic Jing Hu Dongmei Li Shuyan Wan Richard Yamada Jeong-Me Yoon Zailong Wang (Mentor)

2. Outline for Our Talk

  • Introduction and summary of previous work (Richard)
  • Exploratory Analysis of Data (Jeong-Mi)
  • Statistical Methods (Shuyan)
  • Selected Gene Analysis (Jing)
  • Conclusions and Further Work (Dongmei)

3. Human Heart Function 4. Arteries 5. What is Ischemic Cardiomyopathy?

  • Ischemic Lack of Blood and Oxygen
  • Cardio Refers to the Heart
  • Myopathy Muscle Related Disease
  • ischemic cardiomyopathy is a medical term that doctors use to describe patients who havecongestive heart failurethat is a result ofcoronary artery disease . (coronary arteries are blocked)

6. Ischemic Cardiac Myopathy

  • Risk Factors: genetics, smoking, high fat diet, obesity, and prior heart problems
  • Incidence: 1 in 100, typically male, starting with middle age
  • Symptoms include: chest pain, shortness of breath, irregular/rapid pulse, and sensation of feeling the heart beat
  • Treatment Regimens: ACE inhibitors, beta blockers, angioplasty (to improve blood flow to the damaged or weakened heart muscle), and heart transplant (severe cases)

7. The Basic Scientific Question

  • What kinds of changes occur in cardiac transcription profiles brought about by heart failure?
  • 2 ways to go about attacking the question: Molecular Biology (hypothesis based) vs High Thru-put techniques (i.e. microarrays followed by confirmation of gene expression with qPCR)

8. Differential Expression between ischemic and non-ischemic cardiomyopathy patients

  • Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct gene in the development of heart failure
  • M. Kittleson, K. Minhas, R. Irizarry, S. Ye, G. Edness, E. Breton, J. Conte, G. Tamselli, J. Garcia, and J. Hare. Physiol. Genomics, 21:299-307, 2005

9. Methods of Kittleson et al

  • 31 cardiomyopathy vs. 6 normal patients (clinical characteristics were reasonably similar within groups)
  • Tissue taken from cardio-myopathy patients at the time of LVAD or cardiac transplantation
  • Identified differentially expressed genes in 2 comparisons: NICM (hypertrophic, valvular, alcholic) vs NF hearts and ICM vs NF using significance analysis of microarrays
  • Identified genes with FDR < 5% and absolute fold change greater than 2.0

10. Conclusions of Kittleson et al

  • No hypothesis, but the microarray experiment was used to generate hypothesis
  • Types of genes differentially expressed (41 total): cell growth maintenance(9), signal transduction(7), metabolism(3), cell adhesion/cell communication(2), binding(2), and catalytic activity(2), nucleus(3), other (13)

11. Conclusions of Kittleson et al

  • Predominance of fatty acid metabolic genes genesis of NICM might be metabolic in nature
  • Predominance of abnormalities in catalytic activity with ICM (serine proteinase inhibitors)
  • TNFRSF11B (member of TNF receptor subfamily) is significantly downregulated in ICM

12. Experimental Procedure for Data that We are Using

  • Collected myocardial samples from patients undergoing cardiac transplantation whose failure arises from ischemic cardiomyopathy and from "normal" organ donors whose hearts cannot be used for transplants
  • The transcriptional profile of the mRNA in these samples was measured with gene array technology.
  • Changes in transcriptional profiles can be correlated with the physiologic profile of heart-failure hearts acquired at the time of transplantation.

13. Working Hypothesis ?

  • Because of the results of Kittleson et al, we can generate a simple working hypothesis:
  • Our differentially expressed genes, using our methods of statistical analysis of the data, should roughly be the same as what Kittleson et al obtained in their paper .

14. Exploratory Analysis of Data

  • Goal: identify genes whose expression levels are
  • differentially expressed between Ischemic and
  • Normal.
  • Affymetrix Data with Two Population:
  • 54,675 genes are expressed
  • for32 Ischemic samples
  • 14 Normal samples
  • How do we compare?

15.

  • Pre-processing:
  • Only obtain the expression measurementof the data (ie., put it into exprSet) using thedefault of justRMA method:
  • bgcorrect.method = rma
  • normalized.method = quantiles
  • summary.method = liwong

16.

  • Histogram of Ischemic/Normal :
  • The distribution is skewed right.
  • The rangeis between 4 to 14.
  • Both histograms have similar shapes.
  • Boxplot of Ischemic/Normal:
  • There are many outliers from the upper values.
  • The intensity of Ischemic is higher than Normal.
  • Histogram of MAD (Median Absolute Deviation)
  • &Cut-off Method by MAD :
  • Apply MAD > 0.1.
  • We can filter out 675 genes from a total of 54675 genes.
  • Quantile-Quantile plot:
  • A visual aid for identifying genes with unusual test
  • statistics.
  • It shows the large deviation at the right tail.

17. 18. 19. 20. 21. 22.

  • t-Testfor:
  • Mean difference between Ischemic and Normal
  • H 0:H 1 :
  • We are testing 54675 genes simultaneously and adjust for multiple testing when assessing the statistical significance of the observed associations to control the false positive rate.

23. Multiple Hypothesis Testing

  • Motivation:
  • To identify as many differentially expressed genes as possible, while incurring a relatively low proportion of false positives.
  • H 0 : No differential gene expression (between Ischemic and normal group)
  • Large multiplicity problem: more than fifty thousand hypotheses are tested simultaneously.
  • How can we control the false positive rate genomewide? FDR or pFDR.

24. Table1. Possible outcomes from thresholding m genes for significance (m p-values with some cutoff point applied). m m - S S (# of sign. features) Total m 1 m 1- F T (# of true positives) True alternative( H ais true) m 0 m 0- F F (# of false positives) True null( H 0is true) Total Called not significant(accept H 0 ) Called significant(reject H 0 ) 25. False Discovery Rate

  • FDR: = E(F/S)
  • In case S=0, defined to be: E(F/S|S>0)P(S>0) or define F/S=0 if S=0.
  • Alternatively, definepFDR =E(F/S|S>0). When m is large, P(S>=0) is approx. 1 and FDR is approx. equal to pFDR.
  • FDR is a measure of the overall accuracy of a set of significant features.

26. Linear Step-Up Procedure 27. Steps

  • Select desired limitqon E(FDR)
  • Reject
  • Let
  • Order the p-values

28. FDR Adjusted P-Values

  • For an individual hypothesis,

FDR Adjusted =p-valuelowest level of FDRfor which the hypothesisis first includedin the set ofrejected hypothesis 29. Data inter-dependencies

    • Multiple testing of such data will produce correlated test statistics !
    • RNA source
    • normalization process
    • pooled variability estimation

- co-regulation - spatial effects

  • Between genes :
  • Between measurement errors of expression levels :

30.

      • Positive Dependency
      • (Benjamini & Yekutieli, 2001 and Yekutieli, 2002).

Correlated Test Statistics

      • The linear step-up procedure controls the FDR for positive dependent test statistics.
      • This condition is satisfied by :
      • - positively correlated one-sided normal and t test statistics.
      • - absolute values of normal and t test statistics, when all null hypotheses are true.

31. BH and BY procedure

  • BH
    • adjustedp -values for the Benjamini & Hochberg (1995) step-up FDR controlling procedure (independent and positive regression dependent test statistics).
  • BY
    • adjustedp -values for the Benjamini & Yekutieli (2001) step-up FDR controlling procedure (general dependency structures).

32. Our Results

  • rawpBHBY
  • 017577 17577 17577
  • 1e-04 384003796035207
  • 2e-04 39239 38833 35935
  • 3e-04 39717 39334 36373
  • 4e-04 40053 39690 36714
  • 5e-04 40321 39972 36966
  • 6e-04 40565 40174 37166
  • 7e-04 40786 40389 37370
  • 8e-04 40948 40569 37513
  • 9e-04 41096 40739 37661
  • 0.0141226 40885 37781

33. Plot of sorted adjusted p-values 34. Plot of adjusted p-values vs. test statistics 35. Gene Selection Analysis

  • Further select genes based on the fold change between two conditions (Ischemic vs. Normal)
  • The fold change for each gene is calculated as the average expression over all Ischemic samples divided by the average expression over all normal samples.

36. 37. Fold change cutoff value

  • There are 1495 genes with Log2(fold change) > 1, and 26 genes with Log2(fold change) < -1
  • There are only 43 genes with Log2(fold change) > 2, and 3 genes with Log2(fold change) < -2
  • We choose the first option

38. 39. Discussion

  • Among the 54,675 mRNA transcripts present on the Affymetrix microarray platform, 675 housekeeping genes were filtered out.
  • By selecting the adjusted P-value less than 0.0001, only 35,207 genes are left for the analysis of fold change.
  • After fold change selection, only 1521 genes are leftfor further selection.
  • Finally, 74 up-regulated genes and 26 down-regulated genes are selected from the microarray analysis for further biological verification and study.

40. Summary of the Selected Genes

  • Of the 100 genes, there are 53 genes that have known biological functions. The functions of the other 47 genes are unknown.

41. Gene Classification

  • Based on the biological process of the genes, the 100 genes can be classified in several categories.

42. Biological Function Classification 43. 44. 45. 46. Differentially Expressed Genes to ISC-Normal Comparisons

  • Among the 100 genes that are differentially expressed between ischemic and normal, the majority fell into cell adhesion, cell growth and maintenance, signal transduction, muscle contraction and development, immune response andregulation of transcription.
  • Most of the genes are up-regulated in above process except one or two genes in the process of cell growth and maintenance and cell adhesion.
  • Few genes belong to metabolism, inflammatory response, acute phase response and oncogenesis.

47. An important gene for Ischemic Cardiomyopathy

  • Serine proteinase inhibitors has an anti-ischemic protective effect and has been previously observed in pigs subject to experimentally induced myocardial ischemia (Khan 2004): Aprotinin reduces reperfusion injury after regional ischemia and cardioplegic arrest. Protease inhibition may represent a molecular strategy to prevent postoperative myocardial injury after surgical revascularization with cardiopulmonary bypass.
  • It was hypothesized to ben an important gene in Kittlesons paper (Physiol. Genomics, 2004).

48. The significance of the results

  • The gene differentiation analysis find out the genes that either up-regulated or down-regulated in ischemic patients, which can correlated with clinical parameters in heart failure patients and supported ongoing efforts to incorporate expression profiling-based biomarkers in determining prognosis and response to therapy in heart failure.

49. Comparison with Kittleson et. al.s Paper

  • Although only one common gene is found in the analysis, it is consistent considering the sample size difference, the tissue difference and the statistical analysis method difference.
  • However, most of the genes identified from the analysis fell in the same categories of the biological functions.

50. Limitation

  • Because circumstances causing a donor heart to be ineligible for cardiac transplantation, such as infection or prolonged hypotension, can also affect gene expression, a normal functional unused donor heart is not the same as a normal heart.

51. Future Work

  • First, the gene expression profile of these 100 genes need to be verified by the Northern Blot or Real-Time RT-PCR (qPCR).
  • After verification, some high fold change unknown function genes can be chosen to study their functions for biologists.

52. Acknowledgements

  • MBI (Prof. Friedman and staff)
  • Professors Shili Lin and Joseph Verducci
  • Dr. Zailong Wang
  • Dr. Nusrat Rabbee