40
1 Prof. Dr. Jürgen Gadau Transcriptomics

Transcriptomics - bioinformatics.uni-muenster.de

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Transcriptomics - bioinformatics.uni-muenster.de

1

Prof. Dr. Jürgen Gadau

Transcriptomics

Page 2: Transcriptomics - bioinformatics.uni-muenster.de

2

What is transcriptomics?

• The sum of all transcribed components (RNA) of a genome (DNA) is referred to as its transcriptom.

Page 3: Transcriptomics - bioinformatics.uni-muenster.de

3

What does my laboratory do?N. giraulti ♂

N. vitripennis ♂

Page 4: Transcriptomics - bioinformatics.uni-muenster.de

4

HDL

N. giraulti ♂

N. vitripennis ♂

Male Courtship behavior

Male sex pheromone evolution

0

100

V… G… V… G…

F2 Hybrid breakdown

PrezygoticPostzygotic

Functional and Comparative Genomics of speciation and species differences

Cuticular hydrocarbon pattern –species and sex recognition

Page 5: Transcriptomics - bioinformatics.uni-muenster.de

5

Evolution of Gene Regulation –Phenotypic Plasticity

Reproductive division of labor

Worker Subcastes

non-reproductive division of labor

AggressionCOOPERATION AND CONFLICT

Page 6: Transcriptomics - bioinformatics.uni-muenster.de

6

What is Functional Genomics?

…and how does transcriptomicsfactor in?

Page 7: Transcriptomics - bioinformatics.uni-muenster.de

7

• Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic and transcriptomic projects (such as genome sequencing projects and RNA-seq) to describe gene (and protein) functions and interactions.

• Functional genomics uses genomic data to study gene and protein expression and function on a global scale (genome-wide or system-wide), focusing on gene transcription, translation and protein-protein interactions, and often involving high-throughput methods.

Some Definitions…

Page 8: Transcriptomics - bioinformatics.uni-muenster.de

8

Genotype to phenotype

• Classic Mendelian genetics and how we teach Genetics/Evolution is looking at a single locus that we can map.

• One gene (white) à 1 character

• Organisms or the genetic architecture of complex/simple traits are much more complex.

*In January 1910, more than a century ago, Thomas Hunt Morgan discovered his first Drosophila mutant, a white-eyed male (Morgan 1910).

White-eyed mutant*

Page 9: Transcriptomics - bioinformatics.uni-muenster.de

9

From Genotype to Phenotype…is a little bit more complex than we

usually think!

Envi

ronm

ent

Good flier

Bad flier

Page 10: Transcriptomics - bioinformatics.uni-muenster.de

10

Environment

Developm

entGenotype-Phenotype-Fitness-Map

Phenotype-SpaceGenotype-Space Aggressive -> Monogyny

Cooperative -> Polygyny

Fitness

…Or Behavior

GENE-EXPRESSION

Page 11: Transcriptomics - bioinformatics.uni-muenster.de

11

…one more complication

?

Fixed Genetic Differences versus

Phenotypic Plasticity

Page 12: Transcriptomics - bioinformatics.uni-muenster.de

12

Any Questions?

Page 13: Transcriptomics - bioinformatics.uni-muenster.de

13

qPCR

Page 14: Transcriptomics - bioinformatics.uni-muenster.de

14

Methods to measure Generegulation

Page 15: Transcriptomics - bioinformatics.uni-muenster.de

15

RNA-SEQ/MICROARRAY INTRODUCTION

Microarray

Page 16: Transcriptomics - bioinformatics.uni-muenster.de

16

RNA-Seq

Page 17: Transcriptomics - bioinformatics.uni-muenster.de

17

What are the advantages of RNA-Seq vs Microarray?

• Prior sequence (genome or transcriptome) not required

• Microarrays are costly and time consuming to design and produce and are just for one species

• Greater dynamic range and sensitivity than microarrays

• Improved ability to discriminate regions of high sequence identity

• Greater multiplexing of samples

Page 18: Transcriptomics - bioinformatics.uni-muenster.de

18

Typical RNA-Seq Strategy

~300 bpinsert size

Library Construction

Illumina HiSeq 2000•104 bp paired end reads

Sequencing Assemblyor Alignment

Contigsnext-gen

104 bp reads

Page 19: Transcriptomics - bioinformatics.uni-muenster.de

19

https://galaxyproject.org/tutorials/rb_rnaseq/

Page 20: Transcriptomics - bioinformatics.uni-muenster.de

20

Transcriptome pipeline using UNIX-based tools

BowtieCreates reference genome index for alignment

(based on Broad Anocar2.0 assembly)

TophatAligns RNA-Seq reads to reference index

Cufflinks and CuffdiffTool for quantification and statistical analysis

of RNAseq reads , also can improve annotation

IGV/Tablet/ApolloViewers for output of RNA-Seq output from

Cufflinks

Page 21: Transcriptomics - bioinformatics.uni-muenster.de

21

Tools for RNA-Seq Analysis (mostly UNIX/Linux)

• Bowtie: http://bowtie-bio.sourceforge.net/index.shtml• Tophat: http://tophat.cbcb.umd.edu/• Samtools: http://samtools.sourceforge.net/• Picard tools: http://picard.sourceforge.net/• Cufflinks: http://cufflinks.cbcb.umd.edu/• Apollo: http://apollo.berkeleybop.org/current/index.html• IGV: http://www.broadinstitute.org/igv/• ABySS: http://www.bcgsc.ca/platform/bioinfo/software/abyss

Page 22: Transcriptomics - bioinformatics.uni-muenster.de

22

Resources Needed

• Genomic sequence in fasta format– For alignment of RNA sequence

• Annotations in GTF (General Feature Format) format– For binning RNAseq reads into transcripts for

quantitative analysis, as well as to help with annotation in Apollo

• RNAseq reads in FASTQ format (sequenced and quality score)

Page 23: Transcriptomics - bioinformatics.uni-muenster.de

23

Bowtie

Creates reference genome index for alignment• Commands needed:

– Bowtie-build –f (for fasta file) reference_in.faout_file_name

– Runtime: ~4 hrs for vertebrate genome• Output:

– 6 files that will be used by tophat

Page 24: Transcriptomics - bioinformatics.uni-muenster.de

24

Tophat

Aligns RNAseq reads to reference index• Commands needed

– tophat –r (average gap distance for paired reads) –o (output directory) reference_input (name you gave output in bowtie) RNAseq_reads_1.fastq RNAseq_reads_2.fastq

– Runtime: multiple days• Output

– Read allignments in .bam or .sam format– Junctions, insertion and deletion as .bed format

Page 25: Transcriptomics - bioinformatics.uni-muenster.de

25

Tophat aligns reads to reference genome

Page 26: Transcriptomics - bioinformatics.uni-muenster.de

26

CufflinksTool for quantification and statistical analysis of RNAseqreads• Commands used:

– Cufflinks –G (.GTF annotation file) input_file.sam• Basic command that gives FPKM values and errors for each

transcript in a tab delimited file• Runtime: several hours (goes for all cuff programs)

– Cuffcompare reference_annotation.gtf input_file_tissue1.gtf (from cufflinks) input_file_tissue2.gtf

• Secondary analysis using output from cufflinks. Gives data on pooled transcript levels, a .GTF file for annotation based on all tissues, major isoforms found (if annotated in reference .GTF file)

Page 27: Transcriptomics - bioinformatics.uni-muenster.de

27

Cufflinks quantifies expression level based on FPKM

0

50

100

150

200

250

300

350

400

450

500

012345

Ttn

Acta1

Mef2a

Mstn

FPKM = Fragments per kilobase-pair per million sequenced reads

FPKM

Page 28: Transcriptomics - bioinformatics.uni-muenster.de

28

RPKM (FPKM avoids counting fragments twice in paired end

sequencing)• Count up the total reads in a sample and divide

that number by 1,000,000 – this is our “per million” scaling factor.

• Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving you reads per million (RPM)

• Divide the RPM values by the length of the gene, in kilobases. This gives you RPKM.

Page 29: Transcriptomics - bioinformatics.uni-muenster.de

29

WHAT TO DO WITH TRANSCRIPTOMIC DATA?

Page 30: Transcriptomics - bioinformatics.uni-muenster.de

30

• Sequencing cDNA• mRNA vs unprocessed RNA (depends on

library prep but you want to get rid of the ribosomal RNAs)

• Annotate coding SNPS• Transcript isoforms • Splice junctions• Count abundance of transcripts

Transcripts found by RNA-Seq

Page 31: Transcriptomics - bioinformatics.uni-muenster.de

31

Review: Elements of protein coding genes

Page 32: Transcriptomics - bioinformatics.uni-muenster.de

32

Review: Processed transcript

Page 33: Transcriptomics - bioinformatics.uni-muenster.de

33

Review: Alternate splicing

Page 34: Transcriptomics - bioinformatics.uni-muenster.de

34

Gene Orthology vs. Paralogy

Ancestor Gene A

lizard Gene A1 Gene A2

mouse Gene A1 Gene A2

paralogs

orthologues

Page 35: Transcriptomics - bioinformatics.uni-muenster.de

35

• Not all genes in the genome encode proteins. • Human Chromosome 11, has 1,300 protein-coding genes

and 200 noncoding RNA genes, whose final product is an RNA, not a protein.

• Some are involved in the regulation of other genes, whereas others may play structural roles in various nuclear or cytoplasmic processes.

• MicroRNA (miRNA) genes, are noncoding RNA genes. There are at least 250 in the human genome

• miRNAs are short 22-nucleotide-long noncoding RNAs, at least some of which control the expression or repression of other genes during development.

From Thompson, Medical Genetics

Review: Non-protein coding genes

Page 36: Transcriptomics - bioinformatics.uni-muenster.de

36

Long non-coding RNAs where they come from

Page 37: Transcriptomics - bioinformatics.uni-muenster.de

37

Long non-coding RNAs have many functions in transcription

and translation

Page 38: Transcriptomics - bioinformatics.uni-muenster.de

38

Validation of your RNASeqresults: different methods

– Rerun the same sample on the same instrument – Use multiple sequencing methods for the same

sample.• Quantatative

– qPCR– RNA seq– Microarrays

• Qualitative– PCR– In situ

– Run multiple samples of same tissue type

Page 39: Transcriptomics - bioinformatics.uni-muenster.de

39

How does RT-qPCR work?

Page 40: Transcriptomics - bioinformatics.uni-muenster.de

40

How does RT-qPCR work?