39
Welcome to CS374 Welcome to CS374 Algorithms in Algorithms in Biology Biology

Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation DNA, proteins, cells, evolution Some examples of

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Welcome to CS374Welcome to CS374

Algorithms in BiologyAlgorithms in Biology

Page 2: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Overview

• Administrivia

• Molecular Biology and Computation

DNA, proteins, cells, evolution

Some examples of CS in biology

• Computer Scientists vs Biologists

Page 3: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

CS374: Algorithms in Biologycs374.stanford.edu

1. Attendance• At most 2 classes missed without affecting grade

2. Lectures• Most important requirement

• Select available topic & day, send email to Serafim and George

• Read papers, meet with Serafim 1-2 weeks before lecture

• Ask George any questions on papers while preparing presentation

• Schedule long (2 hr) meeting with Serafim the day before lecture

• Slides due at noon before lecture

Page 4: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

CS374: Algorithms in Biologycs374.stanford.edu

3. Scribing• Please sign up on a first-come first-serve basis• Due 1 week after lecture, edited & distributed 2 weeks after lecture• George will help you edit

4. Summaries• Select 1 lecture among first 10, 1 lecture among rest• Find one relevant paper• Write a 1-page summary of the paper

» Paper reference» Abstract» Discussion

• Ask George for questions/feedback

5. Have fun!

Page 5: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Structure of DNA double helix

T

C

A

C

T

G

G

C

G

A

G

T

C

A

G

C

DNA

Phosphate Group

Sugar

NitrogenousBase

A, C, G, T

Physicist Ornithologist

Page 6: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

DNA to RNA, and genes

DNA, ~3x109 long in humansContains ~ 22,000 genes G

A

G

U

C

A

G

C

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

Page 7: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Structure of proteins

Composed of a chain of amino acids.

R

|

H2N--C--COOH

|

H

20 possible groupsSequence of amino acids folds to form a

complex 3-D structure.

The structure of a protein is intimately connected to its function.

Page 8: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

All living organisms are composed of cells

Page 9: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Genetics in the 20th Century

Page 10: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

21st Century

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

AGTAGGACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

Page 11: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Computational Biology

• Organize & analyze massive amounts of biological data

Enable biologists to use data

Form testable hypotheses

Discover new biology

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

Page 12: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

Page 13: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Some examples of central role of CS1. Sequencing

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT

3x109 nucleotides

~500 nucleotides

Page 14: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Some examples of central role of CS1. Sequencing

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT

3x109 nucleotides

Computational Fragment AssemblyIntroduced ~19801995: assemble up to 1,000,000 long DNA pieces2000: assemble whole human genome

A big puzzle~60 million pieces

Page 15: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Complete genomes today

More than 300 complete genomes have been

sequenced

Page 16: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

2

Page 17: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Where are the genes?Where are the genes?

2. Gene Finding

In humans:

~22,000 genes~1.5% of human DNA

Page 18: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

atg

tga

ggtgag

ggtgag

ggtgag

caggtg

cagatg

cagttg

caggccggtgag

Page 19: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Start codonATG

5’ 3’Exon 1 Exon 2 Exon 3Intron 1 Intron 2

Stop codonTAG/TGA/TAA

Splice sites

2. Gene FindingTopics in CS374:

Finding noncoding RNA genes

Finding short words that regulate the expression of genes

Page 20: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

2easy

3

Page 21: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

3. Protein Folding

• The amino-acid sequence of a protein determines the 3D fold• The 3D fold of a protein determines its function• Can we predict 3D fold of a protein given its amino-acid sequence?

Holy grail of compbio—35 years old problem Molecular dynamics, robotics, machine learning, computational geometry

Topics on Proteins in CS374

1. Protein Structure• Protein Structure Comparison• Evolution of Protein Domains• Molecular Dynamics & Drug Targets• Protein Classification• Protein Folding Dynamics• Protein Kinetics

2. Protein Comparison• Latest multiple alignment tools• Selecting parameters for alignment• Phylogenetic trees

Page 22: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Complete Genomes

More than 200 complete genomes have been

sequenced

Page 23: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Evolution

Page 24: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Evolution at the DNA level

OK

OK

OK

X

X

Still OK?

next generation

Page 25: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

4. Sequence ComparisonSequence conservation implies function

Sequence comparison is key to• Finding genes• Determining function• Uncovering the evolutionary processes

Page 26: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Sequence Comparison—Alignment

AGGCTATCACCTGACCTCCAGGCCGATGCCC

TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | |

TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Sequence AlignmentIntroduced ~1970BLAST: 1990, most cited paper in historyStill very active area of research

query

DB

BLAST

Page 27: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Comparison of Human, Mouse, and Rat

Topics on Genomics in CS374

• Indexing Large DatabasesNewest BLAST techniques

• Repeat Detection

• Genomic RearrangementsFinding the order of shufflesbetween two genomes

Page 28: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

5. Clustering of MicroarraysClinical prediction of Leukemia type

• 2 types Acute lymphoid (ALL) Acute myeloid (AML)

• Different treatment & outcomes• Predict type before treatment?

Bone marrow samples: ALL vs AML

Measure amount of each gene

Page 29: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

6. Protein networks

Newer research area• Construct networks from

multiple data sources

• Navigate networks

• Compare networks across organisms

Statistics Machine learning Graph algorithms Databases

Topics on Protein Networks in CS374

1. IntegrationBuild networks from multiple sources

2. AlignmentCompare networks across species

3. Mathematical propertiesModular, scale free

Page 30: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

7. Human evolution

A

A

A

A

G

G

G

G

A

A

A

A

A

T

T

T

C

C

C

G

T

A

A

T

T

C

C

G

A

A

A

A

T

T

C

C

G

G

G

G

A

A

G

C GA

A C A

A C GA

A C A

C GA

A C GA

A C GAA

A

A

G

A

T

G

A

T

T

G

G

G

A

G

Topics on Human PopulationGenetics in CS374

1. EvolutionFinding fast-evolvinggenes in human populations

2. MigrationTracing the migration ofhumans out of Africa bygenetic studies

Page 31: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

8. Building circuits from cells

Page 32: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

The abstract submission deadline is 11:59 pm, Sunday, October 1, 2006.

Page 33: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Computer Scientists vs Biologists

Page 34: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Computer scientists vs Biologists

• (almost) Nothing is ever true or false in Biology

• Everything is true or false in computer science

Page 35: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Computer scientists vs Biologists

• Biologists strive to understand the complicated, messy natural world

• Computer scientists seek to build their own clean and organized virtual worlds

Page 36: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

• Biologists are obsessed with being the first to discover something

• Computer scientists are obsessed with being the first to invent or prove something

Computer scientists vs Biologists

Page 37: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

• Biologists are comfortable with the idea that all data have errors

• Computer scientists are not

Computer scientists vs Biologists

Page 38: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

• Computer scientists get high-paid jobs after graduation

• Biologists typically have to complete one or more 5-year post-docs...

Computer scientists vs Biologists

Page 39: Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of

Computer Science is to Biology what Mathematics is to Physics

“Antedisciplinary” ScienceWhat is computational biology?

http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010006