Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation ...

Preview:

Citation preview

Welcome to CS374Welcome to CS374

Algorithms in BiologyAlgorithms in Biology

Overview

• Administrivia

• Molecular Biology and Computation

DNA, proteins, cells, evolution

Some examples of CS in biology

• Computer Scientists vs Biologists

CS374: Algorithms in Biologycs374.stanford.edu

1. Attendance• At most 2 classes missed without affecting grade

2. Lectures• Most important requirement

• Select available topic & day, send email to Serafim and George

• Read papers, meet with Serafim 1-2 weeks before lecture

• Ask George any questions on papers while preparing presentation

• Schedule long (2 hr) meeting with Serafim the day before lecture

• Slides due at noon before lecture

CS374: Algorithms in Biologycs374.stanford.edu

3. Scribing• Please sign up on a first-come first-serve basis• Due 1 week after lecture, edited & distributed 2 weeks after lecture• George will help you edit

4. Summaries• Select 1 lecture among first 10, 1 lecture among rest• Find one relevant paper• Write a 1-page summary of the paper

» Paper reference» Abstract» Discussion

• Ask George for questions/feedback

5. Have fun!

Structure of DNA double helix

T

C

A

C

T

G

G

C

G

A

G

T

C

A

G

C

DNA

Phosphate Group

Sugar

NitrogenousBase

A, C, G, T

Physicist Ornithologist

DNA to RNA, and genes

DNA, ~3x109 long in humansContains ~ 22,000 genes G

A

G

U

C

A

G

C

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

Structure of proteins

Composed of a chain of amino acids.

R

|

H2N--C--COOH

|

H

20 possible groupsSequence of amino acids folds to form a

complex 3-D structure.

The structure of a protein is intimately connected to its function.

All living organisms are composed of cells

Genetics in the 20th Century

21st Century

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

AGTAGGACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

Computational Biology

• Organize & analyze massive amounts of biological data

Enable biologists to use data

Form testable hypotheses

Discover new biology

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

Some examples of central role of CS1. Sequencing

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT

3x109 nucleotides

~500 nucleotides

Some examples of central role of CS1. Sequencing

AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT

3x109 nucleotides

Computational Fragment AssemblyIntroduced ~19801995: assemble up to 1,000,000 long DNA pieces2000: assemble whole human genome

A big puzzle~60 million pieces

Complete genomes today

More than 300 complete genomes have been

sequenced

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

2

Where are the genes?Where are the genes?

2. Gene Finding

In humans:

~22,000 genes~1.5% of human DNA

atg

tga

ggtgag

ggtgag

ggtgag

caggtg

cagatg

cagttg

caggccggtgag

Start codonATG

5’ 3’Exon 1 Exon 2 Exon 3Intron 1 Intron 2

Stop codonTAG/TGA/TAA

Splice sites

2. Gene FindingTopics in CS374:

Finding noncoding RNA genes

Finding short words that regulate the expression of genes

DNA to RNA, and genes

G

A

G

U

C

A

G

C

DNA, ~3x109 long in humansContains ~ 22,000 genes

RNA: carries the “message” for “translating”, or “expressing” one gene

transcription translation

folding

1

2easy

3

3. Protein Folding

• The amino-acid sequence of a protein determines the 3D fold• The 3D fold of a protein determines its function• Can we predict 3D fold of a protein given its amino-acid sequence?

Holy grail of compbio—35 years old problem Molecular dynamics, robotics, machine learning, computational geometry

Topics on Proteins in CS374

1. Protein Structure• Protein Structure Comparison• Evolution of Protein Domains• Molecular Dynamics & Drug Targets• Protein Classification• Protein Folding Dynamics• Protein Kinetics

2. Protein Comparison• Latest multiple alignment tools• Selecting parameters for alignment• Phylogenetic trees

Complete Genomes

More than 200 complete genomes have been

sequenced

Evolution

Evolution at the DNA level

OK

OK

OK

X

X

Still OK?

next generation

4. Sequence ComparisonSequence conservation implies function

Sequence comparison is key to• Finding genes• Determining function• Uncovering the evolutionary processes

Sequence Comparison—Alignment

AGGCTATCACCTGACCTCCAGGCCGATGCCC

TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | |

TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Sequence AlignmentIntroduced ~1970BLAST: 1990, most cited paper in historyStill very active area of research

query

DB

BLAST

Comparison of Human, Mouse, and Rat

Topics on Genomics in CS374

• Indexing Large DatabasesNewest BLAST techniques

• Repeat Detection

• Genomic RearrangementsFinding the order of shufflesbetween two genomes

5. Clustering of MicroarraysClinical prediction of Leukemia type

• 2 types Acute lymphoid (ALL) Acute myeloid (AML)

• Different treatment & outcomes• Predict type before treatment?

Bone marrow samples: ALL vs AML

Measure amount of each gene

6. Protein networks

Newer research area• Construct networks from

multiple data sources

• Navigate networks

• Compare networks across organisms

Statistics Machine learning Graph algorithms Databases

Topics on Protein Networks in CS374

1. IntegrationBuild networks from multiple sources

2. AlignmentCompare networks across species

3. Mathematical propertiesModular, scale free

7. Human evolution

A

A

A

A

G

G

G

G

A

A

A

A

A

T

T

T

C

C

C

G

T

A

A

T

T

C

C

G

A

A

A

A

T

T

C

C

G

G

G

G

A

A

G

C GA

A C A

A C GA

A C A

C GA

A C GA

A C GAA

A

A

G

A

T

G

A

T

T

G

G

G

A

G

Topics on Human PopulationGenetics in CS374

1. EvolutionFinding fast-evolvinggenes in human populations

2. MigrationTracing the migration ofhumans out of Africa bygenetic studies

8. Building circuits from cells

The abstract submission deadline is 11:59 pm, Sunday, October 1, 2006.

Computer Scientists vs Biologists

Computer scientists vs Biologists

• (almost) Nothing is ever true or false in Biology

• Everything is true or false in computer science

Computer scientists vs Biologists

• Biologists strive to understand the complicated, messy natural world

• Computer scientists seek to build their own clean and organized virtual worlds

• Biologists are obsessed with being the first to discover something

• Computer scientists are obsessed with being the first to invent or prove something

Computer scientists vs Biologists

• Biologists are comfortable with the idea that all data have errors

• Computer scientists are not

Computer scientists vs Biologists

• Computer scientists get high-paid jobs after graduation

• Biologists typically have to complete one or more 5-year post-docs...

Computer scientists vs Biologists

Computer Science is to Biology what Mathematics is to Physics

“Antedisciplinary” ScienceWhat is computational biology?

http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010006

Recommended