32
BIO-INFO-R-MATICS 劉佳鑫 中研院統計科學研究所 中研院國際研究生生物資訊學程

Bio info-r-matics

Embed Size (px)

Citation preview

Page 1: Bio info-r-matics

BIO-INFO-R-MATICS

劉佳鑫

中研院統計科學研究所

中研院國際研究生生物資訊學程

Page 2: Bio info-r-matics

HI! STORY~~~C.S.

Atanasoff-Berry

FORTRAN

1939

1954

1955

First disk storage (IBM)

1963BASIC

1969UNIX OS

1970SD RAM

(first electronic digital computer )

1972C

1981~85DOS, Windows, C++

1994JAVA

2007

Page 3: Bio info-r-matics

MAY, 1951

Rosalind Elsie Franklin

Photo No. 51

Page 4: Bio info-r-matics

HI! STORY~~~BIOLOGY

1953

1939

Computer

science

Page 5: Bio info-r-matics

HI! STORY~~~BIOLOGY

1953

Structure of DNA

A = TC≣G

1939

Computer

science

Page 6: Bio info-r-matics

HI! STORY~~~BIOLOGY

1953Structure of DNA

1944

DNA was the agent

responsible for transferring

genetic information

1822

1884

The theories of Heredity

1943

Chromosome, gene, protein

1859 Darwinian Evolution

On the Origin of Species

1890

1962

Sir Ronald Aylmer Fisher

neo-Darwinian synthesis

population genetics

ANOVA

The Design of Experiments

Statistics !!

1990

2003Human Genome Project 2008

20121000 Genome Project

1939

Computer

science

Lamarckian Evolution1801

Page 7: Bio info-r-matics

DATE! DATE! DATE!

Biology ComputerStatistics

Genetics

Evolution

Cell biology

Molecule biology

Biochemistry

Medical science

Bioinformatics

Page 8: Bio info-r-matics

Bioinformatics

Biology

Page 9: Bio info-r-matics

BIOINFOMATICIAN

5 levels

Level 1: Analyze data on websites ……

Level 2: Install software and conduct it !

Level 3: Programming (C/R/JAVA/PERL/Paython……)

Level 4: Write scripts of known algorithms, combine ,and maintain them

Level 5: Develop new algorithm to solve question in biology

1

2 3

4

5

Page 10: Bio info-r-matics

LEVEL 1

Paste

Submit

Page 11: Bio info-r-matics

LEVEL 1

Page 12: Bio info-r-matics

LEVEL 2

Installation guide

Page 13: Bio info-r-matics

DATA SCIENCE

Page 14: Bio info-r-matics
Page 15: Bio info-r-matics

Visualization

Parsing

Page 16: Bio info-r-matics

LEVEL 3

Page 17: Bio info-r-matics

LEVEL 4

seed=1234,

sample.size=500,

resample.number=1000,

alpha=0.05

original.sample<-runif(sample.size, min=0, max=1)

resample.results<-data.frame("Run.Number"=NULL,"mean"=NULL)

for(counter in 1:resample.number){

temp<-sample(original.sample, size=length(original.sample), replace = TRUE)

temp.mean<-mean(temp)

temp.table.row<-data.frame("Run.Number"=counter,"mean"=temp.mean)

resample.results<-rbind(resample.results,temp.table.row)

}

resample.results<-resample.results[with(resample.results, order(mean)), ]

lowerCI.row<-resample.number*alpha/2

upplerCI.row<-resample.number*(1-(alpha/2))

median.row<-resample.number/2

median<-resample.results$mean[median.row]

lowerCI<-resample.results$mean[lowerCI.row]

upperCI<-resample.results$mean[upplerCI.row]

median.run<-resample.results$Run.Number[median.row]

lowerCI.run<-resample.results$Run.Number[lowerCI.row]

upperCI.run<-resample.results$Run.Number[upplerCI.row]

mc.table<-data.frame("median"=NULL,"lowerCI"=NULL,"upperCI"=NULL)

values<-data.frame(median,lowerCI,upperCI)

runs<-as.numeric(data.frame(median.run,lowerCI.run,upperCI.run))

mc.table<-rbind(mc.table,values)

mc.table<-rbind(mc.table,runs)

Monte Carlo simulation

Hidden Markov model

Simulation annealing

Bayesian analysis

.

.

.

.

Page 18: Bio info-r-matics

LEVEL 5

Page 19: Bio info-r-matics

周易-形上形下

形而上者謂之道 形而下者謂之器

「形」: 天象地形

「道」: 天象地形上存在的抽象原理 (Metaphysics)

「器」: 天地變化、陰陽交感下所生的具體事物

道器不相離,如有天地,太極之理

Page 20: Bio info-r-matics

QUESTION

lysClysineglucose

E.colilysine

lysine

lysine

lysine

lysine

lysine

lysine

lysine

lysine

lysinelysine

lysinelysine

Page 21: Bio info-r-matics

QUESTION

Less than 10 % of human genome with know(?) functions

• Only 1% code for protein

Identify functional elements in the Human genome

• Genetic approach

• Evolutionary approach

• Biochemical approach

AIM

Page 22: Bio info-r-matics

GENETIC

Rely on sequence alterations

To establish the biological relevance of a DNA segment

Page 23: Bio info-r-matics

GENETIC

Association ~~~~

• Pearson

• Spearman

• Logistic

Genome-Wide Association Studies (GWAS)

NHGRI GWAS Catalog

Page 24: Bio info-r-matics

GENETICS

Page 25: Bio info-r-matics

EVOLUTIONARY

Comparative genomics

only 5% of mammalian genomes are under strong evolutionary

constraint across multiple species (e.g., human, mouse, and dog)

Multiple alignment technology

Page 26: Bio info-r-matics

BIOCHEMICAL

Detect biochemical activity

Encyclopedia of DNA Elements (ENCODE) Project

50% of nucleotides in the human genome are readily recognizable

as repeat elements.

Page 27: Bio info-r-matics

ANOTHER TOP IC IN

BIOINF ORM ATICS

The most popular bio-industry in human history

Page 28: Bio info-r-matics

ANOTHER TOP IC IN

BIOINF ORM ATICS

Enzyme kinetics

Page 29: Bio info-r-matics

ANOTHER TOP IC IN

BIOINF ORM ATICS

Metabolic control analysis

wiki

Page 30: Bio info-r-matics

BIOLOGICAL NETWORKS

Metabolism network

KEGG2004, NRG, Barabasi

2008, Science Signaling

Signal transduction network

Protein

interaction

network

Page 31: Bio info-r-matics

BIOL OGICAL NETWORKS ANALYSIS

Metabolic control analysis

Flux balance analysis

Sensitivity analysis

Network property analysis

Differential equation

Partial differential equation

Linear programming

Genetic algorithm

Page 32: Bio info-r-matics

TAKE HOME MESSAGE

History

Components of bioinformatics

Levels of involving bioinformatics

Some topics in bioinformatics

Keep thinking……

Google!