16
Differential Gene Expression In Rna- Seq Data For Oral Squamous Cell Carcinoma Using Bioconductor 19-Apr-15 1 By: Kasturi P Chandwadkar BBI 8 th sem BI-12

undergrad thesis

Embed Size (px)

Citation preview

Page 1: undergrad thesis

Differential Gene Expression In Rna-Seq Data For Oral Squamous Cell

Carcinoma Using Bioconductor

19-Apr-15 1

By:

Kasturi P Chandwadkar

BBI 8th sem

BI-12

Page 2: undergrad thesis

Overview

• Introduction

• Methodology

• Results

• Conclusion

• References

19-Apr-15 1

Page 3: undergrad thesis

INTRODUCTION

• Oral squamous cell carcinoma(OSCC) represents 90% of oral cancer and the chances increase with the increase in age.

• Techniques for assessing and quantifying RNA by high-throughput sequencing are collectively known as “RNA- Seq”.

• RNA-Seq has been applied to get the complex transcriptomes /genes of mammalian samples, including human embryonic kidney and B-cells, mouse embryonic stem cells, blastomeres, and different mouse tissues

19-Apr-15 3

Page 4: undergrad thesis

ADVANTAGES OF RNA SEQ

• One of the advantages of RNA-Seq over other profiling technologies like microarray is the ability to query all transcripts without prior knowledge about the location and structures of genes.

• RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence.

• RNA-Seq has very low background signal because DNA sequences can unambiguously mapped to unique regions of the genome

19-Apr-15 4

Page 5: undergrad thesis

R AND BIOCONDUCTOR PACKAGES• R (http://cran.at.r-project.org) is a comprehensive statistical

environment and programming language for professional data analysis and graphical display.

• Bioconductor (http://www.bioconductor.org/) provides many additional R packages for statistical data analysis in different life science areas, such as tools for microarray, sequence and genome analysis.

• Packages used for differential gene expression:• Biostrings

• biomaRt

• baySeq

• DESeq

• edgeR

19-Apr-15 5

Page 6: undergrad thesis

Methodology• RETRIEVAL OF NGS DATA• The RNA-Seq data (FASTQ files) of oral squamous cell carcinoma was taken

from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) with accession number GSE20116

• MAPPING OF GENOMIC READS• The short reads are mapped/aligned to the reference genome using Bowtie.

• GENERATING COUNT FILE• A count file is matrix in which counts represent the number of times the

genomic region mapped with the reference genome and Id represents the genomic region annotation.

• GETTING DIFFRENTAL EXPESSION GENES• edgeR• DESeq• baySeq

19-Apr-15 6

Page 7: undergrad thesis

RNA-Seq analysis pipeline for detecting DGE

19-Apr-15 7

SHORT READS

ALIGN READS TO REFERENCE GENOME

PREPARE COUNT FILE FROM SAM FILE

GET DIFFERENTIAL GENE EXPRESSION

edgeR baySeqDESeq

List of DEG List of DEG List of DEG

Venn diagram of DEG from three packages

Page 8: undergrad thesis

Results• DATA EXPLORATION

19-Apr-15 8

Outlier in the data

Page 9: undergrad thesis

edgeR

Gene id logFC P-value

KRT36 -8.103353 7.842049e-15

SFTPB -8.120520 2.535246e-14

CA3 -6.443105 1.804193e-13

TNNC2 -6.431288 3.040273e-13

MAGEA11 8.881312 1.124744e-12

19-Apr-15 9

TOP 5 DIFFERENTIALLY EXPRESSED GENES

Page 10: undergrad thesis

deSeq

19-Apr-15 10

Gene id logFC p-value

FBP2 Infinite 1.576300e-05

TUSC5 Infinite 9.160142e-04

UTS2R Infinite 1.520430e-03

ADIPOQ 7.231394 1.444721e-03

C6 7.162190 9.311805e-05

TOP 5 UPREGULATED GENES

Gene id logFC p-value

EMX1 -Infinite 1.765941e-03

VTCN1 7.289467 4.408178e-07

HOXD11 5.504204 2.041803e-04

HOXC8 5.503361 1.621344e-04

C5orf38 5.428227 9.407919e-05

TOP 5 DOWNGULATED GENES

Page 11: undergrad thesis

bayseq

19-Apr-15 11

Gene id LIKELIHOOD FDR

RRAGD 0.9987850 0.001214965

TGFBR3 0.9981198 0.001547566

PYGM 0.9973711 0.001908003

SH3BGRL2 0.9973000 0.002106007

PLA2G2A 0.9972789 0.002229018

TOP 5 DIFFERENTIALLY EXPRESSED GENES

Page 12: undergrad thesis

Venn Diagram Of DGE With P-value Less Than 0.01

19-Apr-15 12

Page 13: undergrad thesis

Conclusion

• We have demonstrated that our DGE method can be successfully applied to RNA-Seq samples in tumor and matched normal tissues.

• By using three different statistical methods for inferring differential gene expression in oral squamous cell carcinoma (OSCC) we got 215 genes common using three packages.

• 1054 genes are common between edgeR and DESeq, 217 are common in between DESeq and baySeq and 278 are common between edgeR and baySeq.

19-Apr-15 13

Page 14: undergrad thesis

Below is table with some of the differential expressed genes in cancer sample which may be related to cancer.

Gene id Description

KRT36 keratin, type I cuticular

ADIPOQ adiponectin C1Q and collagen domain containing

PLA2G2A Phospholipase A2, group IIA (platelets, synovial fluid)

CEACAM7 Carcinoembryonic antigen-related cell adhesion molecule

SPINK7 Serine peptidase inhibitor, Kazal type 7 (putative)

esophagus cancer related gene 22

ALDH1A2 Aldehyde dehydrogenase 1 family, member

ENDOU Endonuclease, polyU-specific

ANGPTL1 Angiopoietins

GDF10 Growth differentiation factor 10

TUSC5 Tumor suppressor candidate 5

4/19/2015 14

Page 15: undergrad thesis

REFERENCES• [1] Published online 15 October 2008 | Nature 455, 847 (2008) |

doi:10.1038/455847a• [2] A scaling normalization method for differential expression analysis of RNA-seq

data Mark D Robinson1,2*, Alicia Oshlack1*• [3] Tumor Transcriptome Sequencing Reveals Allelic Expression Imbalances

Associated with Copy Number Alterations. Brian B. Tuch1., Rebecca R. Laborde2., Xing Xu1, Jian Gu3, Christina B. Chung1, Cinna K. Monighetti1.

• [4] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

• Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg• [5] V. Costa, A. Casamassimi, and A. Ciccodicola, “Nutritional genomics era:

opportunities toward a genome-tailored nutritional regimen,” The Journal of Nutritional Biochemistry, vol. 21, no. 6, pp. 457–467, 2010.

• [6] E. Birney, J. A. Stamatoyannopoulos, A. Dutta, et al., “Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project,” Nature, vol. 447, no. 7146, pp. 799–816, 2007.

• [7] F. S. Collins, E. S. Lander, J. Rogers, and R. H. Waterson, “Finishing the euchromatic sequence of the human genome,” Nature, vol. 431, no. 7011, pp. 931–945, 2004.

• [8] International Human Genome Sequencing Consortium, “A haplotype map of the human genome,” Nature, vol. 437, no. 7063, pp. 1299–1320, 2005.

19-Apr-15 15

Page 16: undergrad thesis

19-Apr-15 16