Upload
chia-hsin-liu
View
43
Download
0
Embed Size (px)
Citation preview
BIO-INFO-R-MATICS
劉佳鑫
中研院統計科學研究所
中研院國際研究生生物資訊學程
HI! STORY~~~C.S.
Atanasoff-Berry
FORTRAN
1939
1954
1955
First disk storage (IBM)
1963BASIC
1969UNIX OS
1970SD RAM
(first electronic digital computer )
1972C
1981~85DOS, Windows, C++
1994JAVA
2007
MAY, 1951
Rosalind Elsie Franklin
Photo No. 51
HI! STORY~~~BIOLOGY
1953
1939
Computer
science
HI! STORY~~~BIOLOGY
1953
Structure of DNA
A = TC≣G
1939
Computer
science
HI! STORY~~~BIOLOGY
1953Structure of DNA
1944
DNA was the agent
responsible for transferring
genetic information
1822
1884
The theories of Heredity
1943
Chromosome, gene, protein
1859 Darwinian Evolution
On the Origin of Species
1890
1962
Sir Ronald Aylmer Fisher
neo-Darwinian synthesis
population genetics
ANOVA
The Design of Experiments
Statistics !!
1990
2003Human Genome Project 2008
20121000 Genome Project
1939
Computer
science
Lamarckian Evolution1801
DATE! DATE! DATE!
Biology ComputerStatistics
Genetics
Evolution
Cell biology
Molecule biology
Biochemistry
Medical science
Bioinformatics
Bioinformatics
Biology
BIOINFOMATICIAN
5 levels
Level 1: Analyze data on websites ……
Level 2: Install software and conduct it !
Level 3: Programming (C/R/JAVA/PERL/Paython……)
Level 4: Write scripts of known algorithms, combine ,and maintain them
Level 5: Develop new algorithm to solve question in biology
1
2 3
4
5
LEVEL 1
Paste
Submit
LEVEL 1
LEVEL 2
Installation guide
DATA SCIENCE
Visualization
Parsing
LEVEL 3
LEVEL 4
seed=1234,
sample.size=500,
resample.number=1000,
alpha=0.05
original.sample<-runif(sample.size, min=0, max=1)
resample.results<-data.frame("Run.Number"=NULL,"mean"=NULL)
for(counter in 1:resample.number){
temp<-sample(original.sample, size=length(original.sample), replace = TRUE)
temp.mean<-mean(temp)
temp.table.row<-data.frame("Run.Number"=counter,"mean"=temp.mean)
resample.results<-rbind(resample.results,temp.table.row)
}
resample.results<-resample.results[with(resample.results, order(mean)), ]
lowerCI.row<-resample.number*alpha/2
upplerCI.row<-resample.number*(1-(alpha/2))
median.row<-resample.number/2
median<-resample.results$mean[median.row]
lowerCI<-resample.results$mean[lowerCI.row]
upperCI<-resample.results$mean[upplerCI.row]
median.run<-resample.results$Run.Number[median.row]
lowerCI.run<-resample.results$Run.Number[lowerCI.row]
upperCI.run<-resample.results$Run.Number[upplerCI.row]
mc.table<-data.frame("median"=NULL,"lowerCI"=NULL,"upperCI"=NULL)
values<-data.frame(median,lowerCI,upperCI)
runs<-as.numeric(data.frame(median.run,lowerCI.run,upperCI.run))
mc.table<-rbind(mc.table,values)
mc.table<-rbind(mc.table,runs)
Monte Carlo simulation
Hidden Markov model
Simulation annealing
Bayesian analysis
.
.
.
.
LEVEL 5
周易-形上形下
形而上者謂之道 形而下者謂之器
「形」: 天象地形
「道」: 天象地形上存在的抽象原理 (Metaphysics)
「器」: 天地變化、陰陽交感下所生的具體事物
道器不相離,如有天地,太極之理
QUESTION
lysClysineglucose
E.colilysine
lysine
lysine
lysine
lysine
lysine
lysine
lysine
lysine
lysinelysine
lysinelysine
QUESTION
Less than 10 % of human genome with know(?) functions
• Only 1% code for protein
Identify functional elements in the Human genome
• Genetic approach
• Evolutionary approach
• Biochemical approach
AIM
GENETIC
Rely on sequence alterations
To establish the biological relevance of a DNA segment
GENETIC
Association ~~~~
• Pearson
• Spearman
• Logistic
Genome-Wide Association Studies (GWAS)
NHGRI GWAS Catalog
GENETICS
EVOLUTIONARY
Comparative genomics
only 5% of mammalian genomes are under strong evolutionary
constraint across multiple species (e.g., human, mouse, and dog)
Multiple alignment technology
BIOCHEMICAL
Detect biochemical activity
Encyclopedia of DNA Elements (ENCODE) Project
50% of nucleotides in the human genome are readily recognizable
as repeat elements.
ANOTHER TOP IC IN
BIOINF ORM ATICS
The most popular bio-industry in human history
ANOTHER TOP IC IN
BIOINF ORM ATICS
Enzyme kinetics
ANOTHER TOP IC IN
BIOINF ORM ATICS
Metabolic control analysis
wiki
BIOLOGICAL NETWORKS
Metabolism network
KEGG2004, NRG, Barabasi
2008, Science Signaling
Signal transduction network
Protein
interaction
network
BIOL OGICAL NETWORKS ANALYSIS
Metabolic control analysis
Flux balance analysis
Sensitivity analysis
Network property analysis
Differential equation
Partial differential equation
Linear programming
Genetic algorithm
TAKE HOME MESSAGE
History
Components of bioinformatics
Levels of involving bioinformatics
Some topics in bioinformatics
Keep thinking……
Google!