21st Century Biology: Informatics in the Post-Genome Era · Informatics in the Post-Genome Era...

Preview:

Citation preview

21st Century Biology:Informatics in the Post-Genome Era

Robert J. RobbinsFred Hutchinson Cancer Research Center

1124 Columbia Street, LV-101Seattle, Washington 98104

rrobbins@fhcrc.org(206) 667 4778

2

Topics

• Biotechnology will be the “magic”technology of the 21st Century.

3

Topics

• Biotechnology will be the “magic”technology of the 21st Century.

• Information Technology (IT) has aspecial relationship with biology.

4

Topics

• Biotechnology will be the “magic”technology of the 21st Century.

• Information Technology (IT) has aspecial relationship with biology.

• Moore’s Law constantly transforms IT(and everything else).

5

Topics

• Biotechnology will be the “magic”technology of the 21st Century.

• Information Technology (IT) has aspecial relationship with biology.

• Moore’s Law constantly transforms IT(and everything else).

• Current funding mechanisms for bio-information infrastructure are hopelesslyinadequate to meet future needs and mustbe radically reformed.

Introduction

MagicalTechnology

7

Magic

To a person from 1897, much currenttechnology would seem like magic.

8

Magic

To a person from 1897, much currenttechnology would seem like magic.

What technology of 2097 would seemmagical to a person from 1997?

9

Magic

To a person from 1897, much currenttechnology would seem like magic.

What technology of 2097 would seemmagical to a person from 1997?

Candidate: Biotechnology so advancedthat the distinction between living andnon-living is blurred.

IT-BiologySynergism

11

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

12

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

13

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

• is incredibly plastic(programming and poetry are both exercises in pure thought)

14

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

• is incredibly plastic(programming and poetry are both exercises in pure thought)

• improves exponentially(Moore’s Law)

15

Biology is Special

Life is Characterized by:

• individuality

16

Biology is Special

Life is Characterized by:

• individuality

• historicity

17

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

18

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

• high (digital) information content

19

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

• high (digital) information contentNo law of large numbers...

20

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

• high (digital) information contentNo law of large numbers, since everyliving thing is genuinely unique.

21

IT-Biology Synergism

• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.

22

IT-Biology Synergism

• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.

• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.

23

Biology is Special

For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.

Erwin Schrödinger. 1944. What is Life.

24

Genetics as Code

[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.

Erwin Schrödinger. 1944. What is Life.

25

One Human SequenceWe now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.

26

One Human Sequence

Typed in 10-pitch font, one human sequence would stretch for morethan 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell.

We now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.

27

Bio-digital Information

DNA is a highly efficient digital storage device:

• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.

28

Bio-digital Information

DNA is a highly efficient digital storage device:

• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.

• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.

Genomics:An Example

30

Computers as Instruments

Computers are not just tools for catalogingexisting knowledge. They are instruments thatchange the way we can see the biologicalworld. Computers allow us to see genomes,just as radio telescopes let us see quasars andmicroscopes let us see cells.

31

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

32

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

33

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;

– determination of the complete sequence of human DNA andof the DNA of selected model organisms;

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

34

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;

– determination of the complete sequence of human DNA andof the DNA of selected model organisms;

– development of capabilities for collecting, storing,distributing, and analyzing the data produced;

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

35

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;

– determination of the complete sequence of human DNA andof the DNA of selected model organisms;

– development of capabilities for collecting, storing,distributing, and analyzing the data produced;

– creation of appropriate technologies necessary to achievethese objectives.

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

36

Base Pairs in GenBank

0

10 0 ,00 0 ,00 0

20 0 ,00 0 ,00 0

30 0 ,00 0 ,00 0

40 0 ,00 0 ,00 0

50 0 ,00 0 ,00 0

60 0 ,00 0 ,00 0

70 0 ,00 0 ,00 0

80 0 ,00 0 ,00 0

90 0 ,00 0 ,00 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

GenBank Release Numbers

9493929190898887 95 96 97

37

0

10 0 ,00 0 ,00 0

20 0 ,00 0 ,00 0

30 0 ,00 0 ,00 0

40 0 ,00 0 ,00 0

50 0 ,00 0 ,00 0

60 0 ,00 0 ,00 0

70 0 ,00 0 ,00 0

80 0 ,00 0 ,00 0

90 0 ,00 0 ,00 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

GenBank Release Numbers

9493929190898887 95 96 97

Base Pairs in GenBank

Growth in GenBank is exponential.More data were added in the last 10weeks than were added in the first 10years of the project.

38

Infrastructure and the HGP

Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.

National Research Council. 1988. Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3

39

Databases and the Genome Project

[The] database developer should provide, insome real sense, an intellectual focus for theinterpretation of genomic data.

NIH-DOE Ad Hoc Committee on Genome Databases

21st CenturyBiology

The Approach

41

Paradigm Shift in Biology

The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.

Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.

42

Paradigm Shift in Biology

To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.

Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.

21st CenturyBiology

The People

44

Human Resources Issues

• Reduction in need for non-IT staff

45

Human Resources Issues

• Reduction in need for non-IT staff

• Increase in need for IT staff, especially“information engineers”

46

Human Resources Issues

• Reduction in need for non-IT staff

• Increase in need for IT staff, especially“information engineers”

In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.

47

Human Resources Issues

Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”

48

Human Resources Issues

Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”

Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”

Exchange at DOE workshop on high-throughput sequencing.

21st CenturyBiology

The Science

50

Fundamental Dogma

The fundamental dogma ofmolecular biology is that genes act tocreate phenotypes through a flow ofinformation from DNA to RNA toproteins, to interactions amongproteins, and ultimately tophenotypes.

Collections of individual phenotypes,of course, constitute a population.

DNA

RNA

Proteins

Circuits

Phenotypes

Populations

51

Fundamental DogmaDNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Although a few databases already existto distribute molecular information,

52

Fundamental DogmaDNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Gene Expression?

Clinical Data ?

Regulatory Pathways?Metabolism?

Biodiversity?

Neuroanatomy?

Development ?

Molecular Epidemiology?

Comparative Genomics?

the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.

Although a few databases already existto distribute molecular information,

21st CenturyBiology

The Literature

54

Electronic Data Publishing

G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI

55

Electronic Data Publishing

O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]

The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning

G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI

56

Electronic Data Publishing

GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161

KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;

SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct

GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161

KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;

SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct

O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]

The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning

G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI

57

Electronic Data PublishingP I R -- Beta Hemoglobin----------------------------------------------DEFINITION

[HBHU] Hemoglobin beta chain- Human, chimpanzee, pygmy chimpanzee, and gorilla

SUMMARY [SUM] #Type Protein #Molecular-weight 15867 #Length 146 #Checksum 1242

SEQUENCEV H L T P E E K S A V T A L W GK V N V D E V G G E A L G R L LV V Y P W T Q R F F E S F G D LS T P D A V M G N P K V K A H GK K V L G A F S D G L A H L D NL K G T F A T L S E L H C D K LH V D P E N F R L L G N V L V CV L A H H F G

P I R -- Beta Hemoglobin----------------------------------------------DEFINITION

[HBHU] Hemoglobin beta chain- Human, chimpanzee, pygmy chimpanzee, and gorilla

SUMMARY [SUM] #Type Protein #Molecular-weight 15867 #Length 146 #Checksum 1242

SEQUENCEV H L T P E E K S A V T A L W GK V N V D E V G G E A L G R L LV V Y P W T Q R F F E S F G D LS T P D A V M G N P K V K A H GK K V L G A F S D G L A H L D NL K G T F A T L S E L H C D K LH V D P E N F R L L G N V L V CV L A H H F G

GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161

KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;

SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct

GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161

KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;

SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct

O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]

The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning

G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI

58

Electronic Scholarly Publishing

HTTP://WWW.ESP.ORGThe ESP site is dedicated to the electronic publishing ofscientific and other scholarly materials. Of particularinterest are the history of science, genetics,computational biology, and genome research.

59

Electronic Scholarly Publishing

The Classical Genetics:Foundations seriesprovides ready accessto typeset-quality,electronic editions ofimportant publicationsthat can otherwise bevery difficult to find.

60

Electronic Scholarly Publishing

“Hardy” (of Hardy-Weinberg) is a namewell known to moststudents of biology.

61

Electronic Scholarly Publishing

But how many haveread, or even seen,all of Hardy’sbiological writings?

This is it: A single,one-page letter to theeditor of Science.

62

Electronic Scholarly Publishing

http://www.blocks.fhcrc.org/~kinesinElectronic publishing is especially appropriate for somekinds of dynamic review papers.

63

Electronic Scholarly Publishing

Reviews may be revisedand maintained in realtime...

64

Electronic Scholarly Publishing

and it is easy to providelarge amounts of in-depth supporting andrelated data.

65

Electronic Scholarly Publishing

66

Electronic Scholarly Publishing

http://www.esp.org/books/darwin/beagleEntire monographs can be made instantly available to readers world-wide..

67

Electronic Scholarly Publishing

Today’s computertechnology was nearlyunimaginable just tenyears ago. The technol-ogy of ten years fromnow will also bringmany surprises.

How is it that IT canmaintain such anamazing rate ofsustained change?

And what, if any, arethe implications of thatrate of change forbiology?

Moore’s Law

Transforms InfoTech(and everything else)

69

Moore’s Law: The Statement

Every eighteen months, thenumber of transistors that canbe placed on a chip doubles.

Gordon Moore, co-founder of Intel...

70

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Time

100,000

10,000

1,000

100

10

71

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

72

Moore’s Law: The Effect

Three Phases of Novel IT Applications

• It’s Impossible

73

Moore’s Law: The Effect

Three Phases of Novel IT Applications

• It’s Impossible

• It’s Impractical

74

Moore’s Law: The Effect

Three Phases of Novel IT Applications

• It’s Impossible

• It’s Impractical

• It’s Overdue

75

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

76

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

77

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

78

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

C

79

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

C

80

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

AA

C

81

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

A

C

82

Moore’s Law: The EffectP

erfo

rman

ce(c

on

stan

t co

st)

Cos

t(c

on

stan

t p

erfo

rma

nce

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

A

C

Relevance for biology?

83

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

UniversityPurchase

84

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

UniversityPurchase

DepartmentPurchase

85

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

RO1 GrantPurchase

UniversityPurchase

DepartmentPurchase

86

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

PersonalPurchase

RO1 GrantPurchase

UniversityPurchase

DepartmentPurchase

Funding forInformation

Infrastructure

88

The Problem

• IT moves at “Internet Speed” and respondsrapidly to market forces.

89

The Problem

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• IT will play a central role in 21st Centurybiology.

90

The Problem

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• IT will play a central role in 21st Centurybiology.

• Current levels of support for public bio-information infrastructure are too low.

91

The Problem

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• IT will play a central role in 21st Centurybiology.

• Current levels of support for public bio-information infrastructure are too low.

• Reallocation of federal funding is difficult,and subject to political pressures.

92

The Problem

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• IT will play a central role in 21st Centurybiology.

• Current levels of support for public bio-information infrastructure are too low.

• Reallocation of federal funding is difficult,and subject to political pressures.

• Federal-funding decision processes areponderously slow and inefficient.

93

Federal Funding of Bio-Databases

The challenges:

94

Federal Funding of Bio-Databases

The challenges:

• providing adequate funding levels

95

Federal Funding of Bio-Databases

The challenges:

• providing adequate funding levels

• making timely, efficient decisions

96

Federal Funding of Bio-Databases

Appropriate funding level:

97

Federal Funding of Bio-Databases

Appropriate funding level:

• approx. 10% of research funding

98

Federal Funding of Bio-Databases

Appropriate funding level:

• approx. 10% of research funding

• i.e., 1 - 2 billion dollars per year

99

Federal Funding of Bio-Databases

Appropriate funding level:

• approx. 10% of research funding

• i.e., 1 - 2 billion dollars per year

Source of estimate:

- Experience of IT-transformed industries.

- Current support for IT-rich biological research.

100

Market Forces

Vendors

productsservices

Buyers

In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.

101

Market Forces

Vendors

productsservices

Buyers

$

purchases

In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.

Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.

102

Market Forces

Vendors

productsservices

Buyers

$

purchases

In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.

Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.

Business 101 Insight:

Successful vendors target aniche and excel at meeting theneeds of that niche.

103

Market Forces

VentureCapital

Vendors$

Buyers

$Stock

Offerings

Funding to initiate the developmentof products and services come frominvestors, not from buyers.

$

VendorInvestment

productsservices

$

purchases

104

Market Forces

VentureCapital

Vendors$

Buyers

$Stock

Offerings

Funding to initiate the developmentof products and services come frominvestors, not from buyers.

Investors decide whether or not toprovide start-up funding based uponthe estimated ability of the vendor tocreate products and services that willmeet real needs at competitive prices.

$

VendorInvestment

productsservices

$

purchases

105

Federal Funding

Database

Users

productsservices

$

purchases

If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.

106

Federal Funding

Investors

Database

$

Users

productsservices

$

purchases

If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.

Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.

107

Federal Funding

Investors

Database

$

Users

productsservices

$

purchases

If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.

Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.

Ultimate success would depend onmeeting the needs of real users.Decisions could be made rapidly, inresponse to changing needs andemerging opportunities.

108

Federal Funding

Agency

Database

Reviewers

OtherAgencies

AgencyAdvisors

Congress

productsservices

OMB $

$ DatabaseAdvisors

Users

Instead, funding decisions forbiological databases can follow aponderously slow course, withalmost no opportunity for input fromreal users.

Those most knowledgeable about aparticular database are oftenexcluded from participating in thereview process because of a possible“conflict of interest” status with thedatabase provider.

109

Federal Funding of Bio-Databases

Possible solution - create market forces:

110

Federal Funding of Bio-Databases

Possible solution - create market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

111

Federal Funding of Bio-Databases

Possible solution - create market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

112

Federal Funding of Bio-Databases

Possible solution - create market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

113

Federal Funding of Bio-Databases

Possible solution - create market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

• data stamps

114

Federal Funding of Bio-Databases

Possible solution - create market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

• data stamps, AKA food (for-thought) stamps ?!

115

Food (for thought) Stamps

Funding Agencies could:

• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.

116

Food (for thought) Stamps

Funding Agencies could:

• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.

• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.

117

Food (for thought) Stamps

Funding Agencies could:

• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.

• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.

• provide funding only after the stamps have beenredeemed at a database provider.

118

Food (for thought) Stamps

Problems:

• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).

119

Food (for thought) Stamps

Problems:

• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).

• how to identify “approved” database providers.

120

Food (for thought) Stamps

Problems:

• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).

• how to identify “approved” database providers.

• how to initiate the FFT system.

121

Food (for thought) Stamps

Problems:

• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).

• how to identify “approved” database providers.

• how to initiate the FFT system.

• etc etc

122

Food (for thought) Stamps

Alternatives (if no solution emerges):• increasingly inefficient research activities (abject

failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).

123

Food (for thought) Stamps

Alternatives (if no solution emerges):• increasingly inefficient research activities (abject

failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).

• loss of access to bio-databases for public-sectorresearch.

124

Food (for thought) Stamps

Alternatives (if no solution emerges):• increasingly inefficient research activities (abject

failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).

• loss of access to bio-databases for public-sectorresearch.

• movement of majority of “important” biologicalresearch into the private sector.

125

Food (for thought) Stamps

Alternatives (if no solution emerges):• increasingly inefficient research activities (abject

failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).

• loss of access to bio-databases for public-sectorresearch.

• movement of majority of “important” biologicalresearch into the private sector.

• loss of American pre-eminence (if othercountries solve the problems first).

126

Food (for thought) Stamps

Bottom Line:

• This might not be the answer, but

127

Food (for thought) Stamps

Bottom Line:

• This might not be the answer, but

• it’s FOOD FOR THOUGHT

128

Slides:

http://www.esp.org/rjr/beckman.pdf

Recommended