39
Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Embed Size (px)

Citation preview

Page 1: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Lecture 4. Topics in Gene Regulation and Epigenomics (Basics)

The Chinese University of Hong KongCSCI5050 Bioinformatics and Computational Biology

Page 2: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 2

Lecture outline1. Introduction to gene regulation and

epigenetics2. Problems in computational biology and

bioinformatics

Last update: 26-Sep-2015

Page 3: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

INTRODUCTION TO GENE REGULATION AND EPIGENETICS

Part 1

Page 4: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 4

Gene regulation• Here defined as the control of the amount and

the products of a gene• Amount:– Number of transcripts produced– Number of protein produced

• Products:– RNAs

• Isoforms• Modifications

– Proteins• Modifications

Last update: 26-Sep-2015

Page 5: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 5

Gene “expression”• Gene expression is a general term used to

indicate the production of gene products• More specific terms:– Transcription rate (number of new transcripts per

time)– Transcript level (total number of transcripts in the

cell)– Translation rate– Protein level

• All these are correlated but not identical, sometimes only weakly correlated

Last update: 26-Sep-2015

Page 6: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 6

Gene regulation• Expression of genes needs to be tightly

regulated– Differentiation into different cell types– Response to environmental conditions

• How are genes regulated?– Transcriptional– Post-transcriptional– Translational– Post-translational

• Analogy: lighting controllingLast update: 26-Sep-2015

Page 7: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 7

A simple illustration

Last update: 26-Sep-2015

G1

P1

G2

P3

MeMeMe

Me

Me

Ac

G3

P5

P6

P7

G4

Page 8: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 8

G4

Histone modifications

Chromatin accessibility

Protein-protein interactions and DNA long-range interactions

Protein-RNA interactions

miRNA-mRNA interactions

DNA methylation

Tran

scrip

tion

fact

or b

indi

ng

A simple illustration

Last update: 26-Sep-2015

G1

P1

G2

P3

MeMeMe

Me

Me

Ac

G3

P5

P6

P7

Page 9: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 9

More details and other mechanisms• Transcriptional regulation

– Transcription factors• Binding to promoter vs. distal elements (e.g., enhancers)• Activators vs. repressors

• Post-transcriptional regulation– Capping– Poly-adenylation– Splicing– RNA editing– mRNA degradation

• Translation– Translational repression

• Post-translational– Protein modifications (e.g., phosphorylation)

Last update: 26-Sep-2015

Image source: http://www.emunix.emich.edu/~rwinning/genetics/eureg.htm

Page 10: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 10

Epigenetics• Wikipedia: “the study of heritable changes in

gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence”– Heritable: Can pass on to offspring (daughter cells)– Mechanisms other than changes in DNA• Same DNA, different outcomes

Last update: 26-Sep-2015

Page 11: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 11

Active and inactive epigenetic signals

• DNA methylation• Chromatin remodeling• Histone modifications• RNA transcripts• ...

Last update: 26-Sep-2015

Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011)

Page 12: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 12

DNA methylation• Methyl group (-CH3) added to cytosine in eukaryotic DNA, usually next to a

guanine (in a CpG dinucleotide)– Forming 5-methylcytosine

• Can be further modified into 5-hydroxymethycytosine

• Hypermethylation at promoter can cause gene repression• Recent studies have suggested links between DNA methylation and

– Protein binding– Transcriptional elongation– Splicing

• Gene imprinting: parent-specific expression• Implications in diseases• De novel vs. maintenance

Last update: 26-Sep-2015

Image source: http://www.zymoresearch.com/media/images/products/D5405-2.jpg, http://missinglink.ucsf.edu/lm/genes_and_genomes/methylation.html

Page 13: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 13

Chromatin remodeling• Chromatin: compact structure of DNA

and proteins– DNA wraps around histone proteins to

form nucleosomes– Nuelceosome positioning can be changed

dynamically, affecting DNA accessibility (e.g., to binding proteins)

Last update: 26-Sep-2015

Image credit: wikipedia

Page 14: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 14

Histone modifications• Modification of specific residues on histone proteins

– Acytelation, methylation, phosphorylation, ubiquination, etc.– Nomenclature: H3K4me3 (Histone protein H3, lysine 4, tri-

methylation)– Histone modifications give different types of signals in gene regulation

Last update: 26-Sep-2015

Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011)

Page 15: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 15

Non-coding RNA• There are different types of functional RNA

that do not translate into proteins

Last update: 26-Sep-2015

Type Abbreviation Function

Ribosomal RNA rRNA Translation

Transfer RNA tRNA Translation

Small nuclear RNA snRNA Splicing

Small nucleolar RNA snoRNA Nucleotide modifications

MicroRNA miRNA Gene regulation

Small interfering RNA siRNA Gene regulation

… … …

Page 16: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 16

MicroRNA• Short (~22 nucleotides) RNAs that

regulate gene expression by promoting mRNA degradation or repressing translation

Last update: 26-Sep-2015

Image credit: wikipedia, Sun et al., Annual Review of Biomedical Engineering 12:1-27, (2010)

Page 17: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 17

Gene regulation and epigenetics• Some mechanisms are known to regulate gene

expression. For example:– Transcription factor binding can activate or repress

transcription– miRNA-mRNA binding can promote mRNA cleavage or

repress translation• Some signals are correlated with expression, but the

causal direction is not certain (or not fixed). For example:– Promoter DNA methylation and transcriptional repression– Histone modifications with expression levels

• The different mechanisms are not independent.

Last update: 26-Sep-2015

Page 18: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 18

High-throughput methods (recap)• Protein-DNA binding (ChIP-seq, ChIP-exo, ...)• DNA long-range interactions (ChIA-PET, Hi-C, TCC, ...)

[project]• DNA methylation (bisulfite sequencing, RRBS, MeDIP-

seq, MBDCap-seq, ...) [project]• Open chromatin (DNase-seq, FAIRE-seq, ...)• Histone modifications (ChIP-seq)• Gene expression (RNA-seq, CAGE, ...), isoforms [project]• Protein-RNA binding (CLIP-Seq, HITS-CLIP, PAR-CLIP, RIP-

seq, ...) [project]• ...

Last update: 26-Sep-2015

Page 19: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

PROBLEMS IN COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Part 2

Page 20: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 20

Some related CBB Problems• Analysis of chromatin patterns [project]• Identification of regulatory elements [lecture,

discussion paper]• Reconstruction of transcription factor (TF)

regulatory networks [project]• Identification of non-coding RNAs [project]• Prediction of miRNA targets [project]• Construction of gene expression models

[project]

Last update: 26-Sep-2015

Page 21: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 21

Analysis of chromatin patterns• Computational tasks:– Segmentation of the human genome• Fix-sized bins• Based on annotation• Unsupervised clustering

– Hidden Markov models

• Supervised classification

– Data aggregation and integration– Large-scale correlations• Learning of signal shapes

– Visualization

Last update: 26-Sep-2015

Page 22: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 22

Genome segmentation• Using

chromatin state to segment the genome– Hidden

Markov model– Clustering

• Annotate identified states using biological knowledge

Last update: 26-Sep-2015

Image credit: Ernst and Kellis, Nature Methods 9(3):215-216, (2012)

Page 23: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 23

Global chromatin patterns• Many recent

findings that relate chromatin patterns with other features– Global example:

histone modifications, recombination rates and chromosome 1D and 3D structures in C. elegans

Last update: 26-Sep-2015

Image credit: Gerstein et al., Science 330(6012):1775-1787, (2010)

Page 24: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 24

Local chromatin patterns– Histone modifications and protein binding at

promoters and enhancers in human

Last update: 26-Sep-2015

Image credit: Heintzman et al., Nature Genetics 39(3):311-318, (2007)

Page 25: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 25

Identifying regulatory elements• There are different types of protein-binding regions

in the DNA– Promoters– Enhancers– Silencers– Insulators– ...

• How to locate them in the genome?

Last update: 26-Sep-2015

Image credit: Raab and Kamakaka, Nature Reviews Genetics 11(6):439-446, (2010)

Page 26: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 26

Identifying regulatory elements• Some useful information:

– Genomic location• E.g., promoters are around transcription start sites

– Evolutionary conservation• Functional regions are more conserved

– Protein binding signals and motifs• E.g., EP300 at enhancers, CTCF at insulators

– Chromatin features• E.g., DNase I hypersensitivity, H3K4me1 and H3k27ac at active

enhancers

– Reporter assays– ...

• Difficulty: integrating different types of informationLast update: 26-Sep-2015

Page 27: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 27

Reconstruction of TF network• Goals:

– Identifying TF binding sites– Determining the target genes of each TF

• In different cell types• In different conditions

– Deducing how gene expression is regulated by TFs– Studying how TFs interact with each other

• Methods:1. From expression data2. Sequence-based (motif analysis)3. From binding experiments– Sign of regulation (activation vs. repression) usually not

determined for #2 and #3

Last update: 26-Sep-2015

Page 28: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 28

Expression-based methods• Input: gene expression levels of

genes– Usually from microarrays– Often time series data

• Output: a network (i.e., directed graph)– Each node is a gene (and its protein

product)– An AB edge means A is a TF and it

regulates B

• Types:– Differential equations– Probabilistic networks– Boolean networks

Last update: 26-Sep-2015

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

Page 29: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 29

Expression-based methods• Differential equations– Models (yj: expression level of gene j, aji: influence of TF i

on gene j):• Linear• Sigmoidal• ...

– Methods:• Solve system of equations to get best-fit parameter values

– Difficulties:• Many parameters when there are many TFs

– Insufficient training data– L1 (LASSO) regularization to control the number of non-zero variables

• Long running time

Last update: 26-Sep-2015

jjjjk

kjkj

m

kkjkj

j yayaayaadt

dy

0

10

jj

jkkjkj

jj yb

yaa

b

dt

dy2

0

1

exp1

Page 30: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 30

Boolean networks• Considering each gene to be either on or off• Treat the gene regulatory network as a Boolean network

(similar to a electric circuit)– Expression of a gene at time t+1 depends on the expression of genes

that regulate it at time t– Goal: Find the logical relationships between genes

Last update: 26-Sep-2015

Image credit: Akutsu and Miyano, Pacific Symposium on Biocomputing 4:17-28, (1999)

Page 31: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 31

From binding data• Input: binding signals of transcription factors in the whole genome

– Usually from ChIP-chip or ChIP-seq– Or from motifs– (Best to combine both)

• Output: TF regulatory network• Difficulties:

– Finding binding sites• Peak calling• Motif analysis

– Associating binding sites with target genes• Promoters (e.g., 500bp upstream of transcription start site)• More difficult for distal binding sites• Expression patterns could help

– Evaluating functional effects of binding (strong vs. weak, transient binding)

Last update: 26-Sep-2015

Page 32: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 32

Combining both types of data1. Use expression

data to infer initial network

2. Identify potential regulators

3. Search for binding motifs of these regulators

4. Incorporate global occurrence of these motifs at gene promoters to refine the network

Last update: 26-Sep-2015

Image credit: Tamada et al., Bioinformatics 19(Suppl.2):ii227-ii236, (2003)

Page 33: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 33

Identification of non-coding RNAs• It has recently been shown that a vast amount

of DNA is transcribed into RNA by high-throughput experiments

• What are they?– Experimental artifacts?– Unannotated protein-coding genes?– Non-functional transcripts?– Functional non-coding RNAs?– Pseudogenes?

Last update: 26-Sep-2015

Page 34: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 34

Construction of expression models• Given the many different mechanisms

involved in gene regulation, how are they related to each other?– Are they redundant?– Do they simply add to each other, or have

synergistic effects?– Which have more impacts to final expression

levels?– What are their time scales?– When is each mechanism used?

Last update: 26-Sep-2015

Page 35: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 35

Construction of expression models• Modeling and prediction– An indirect way to estimate how well a model is:

evaluating the accuracy of predicted expression• Prediction of:– Expression level• Regression: yi f(xi)

• Classification: (yi > t) f(xi)

– Differential expression

Last update: 26-Sep-2015

Page 36: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 36

Construction of expression models• Chromatin features and expression

Last update: 26-Sep-2015

Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)

Page 37: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 37

Construction of expression models• Model construction and accuracy

Last update: 26-Sep-2015

Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)

Page 38: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 38

“Histone code” hypothesis• The statistical models are

good, but too complex for humans to interpret

• Is there a simple set of rules (i.e., a “code”) that can easily tell the expression level of a gene?

Last update: 26-Sep-2015

Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)

Page 39: Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 39

Summary• “Gene expression” is a general term with several

possible meanings• Gene expression is regulated by many mechanisms,

including (but not limited to)– Transcription factor binding– DNA long-range interactions– DNA methylation– Chromatin structure– Histone modifications– MicroRNA-mRNA binding

• A lot of new genome-wide data• Many emerging research topics in CBB

Last update: 26-Sep-2015