Upload
lok
View
40
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Board on Research Data and Information, National Research Council “Changing Roles of Libraries in Support of Scientific Data Activities” June 3, 2010 More Data, More Use, Less Lead Time: Scientific Data Activities at the National Library of Medicine. Betsy L. Humphreys Deputy Director - PowerPoint PPT Presentation
Citation preview
Board on Research Data and Information, National Research Council“Changing Roles of Libraries in Support of Scientific Data Activities”
June 3, 2010
More Data, More Use, Less Lead Time:Scientific Data Activities at the
National Library of Medicine
Betsy L. HumphreysDeputy Director
National Library of Medicinewww.nlm.nih.gov
NLM & Scientific Data
• Data categories– Substances– Sequences– Clinical Research– Taxonomies/Nomenclatures/Ontologies
NLM & Scientific Data
• Challenges (aka Problems)– Much more data
• Greater NIH/other investment in generating data• High throughput methods• New, unfunded mandate(s)
– Much less lead time• Need to achieve standardization more rapidly
Growth In PubChem Tested Substances
start to
12
/31
/20
05
1/3
0/2
00
53
/6/2
00
54
/10
/20
05
5/1
5/2
00
56
/19
/20
05
7/2
4/2
00
58
/28
/20
05
10
/2/2
00
51
1/6
/20
05
12
/11
/20
05
1/1
5/2
00
62
/19
/20
06
3/2
6/2
00
64
/30
/20
06
6/4
/20
06
7/9
/20
06
8/1
3/2
00
69
/17
/20
06
10
/22
/20
06
11
/26
/20
06
12
/31
/20
06
2/4
/20
07
3/1
1/2
00
74
/15
/20
07
5/2
0/2
00
76
/24
/20
07
7/2
9/2
00
79
/2/2
00
71
0/7
/20
07
11
/11
/20
07
12
/16
/20
07
1/2
0/2
00
82
/24
/20
08
3/3
0/2
00
85
/4/2
00
86
/8/2
00
87
/13
/20
08
8/1
7/2
00
89
/21
/20
08
10
/26
/20
08
11
/30
/20
08
1/4
/20
09
2/8
/20
09
3/1
5/2
00
94
/19
/20
09
5/2
4/2
00
96
/28
/20
09
8/2
/20
09
9/6
/20
09
10
/11
/20
09
11
/15
/20
09
12
/20
/20
09
1/2
4/2
01
0
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
Week of
R
ec
ord
s
7
ICMJE
FDAAA 801
~25-30 / wk
~250 / wk
~320 / wk
Number of Studies Registered at ClinicalTrials.gov since May 1, 2005
2,317 Results Records submitted (Sept 2008 – March 2010)– About 30 new results records per week; 80 re-submissions per week
– Anticipate increase in rate as rules become clear and outreach continues
UMLS Metathesaurus – May 2010 version
NLM & Scientific Data• Strengths
– Mission & Track Record• Curation, Storage, Permanent Access, Standards, R & D
– Robust Infrastructure• Staff Expertise, Advisory Structure, Computing, Communications
– Connections between different kinds of data, information– Strong US partnerships and international collaborations– Heavy use
• Weaknesses– The “defects of our qualities”– Limited resources– Less user outreach/training than desirable
Hazardous Substances Data, 1978-
Toxic Release Inventory Data, 1987-
National Center for Biotechnology Information, 1988-
– Design, develop, implement, and manage automated systems for collection, storage, retrieval, analysis, & dissemination of knowledge concerning molecular biology, biochemistry, & genetics
– Perform research into advanced methods of computer-based information processing capable of representing and analyzing the vast number of biologically important molecules and compounds
– Enable persons engaged in biotechnology research and medical care to use these systems & methods
– Coordinate, as much as is practicable, efforts to gather biotechnology information on an international basis
Benzene – PubChem Bioassay Results
300,000
200,000
100,000
Entrez Web Traffic (Unique IP Addresses): 1999 - 2009
400,000
19
98
500,000
600,000
700,000
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
800,000
20
07
900,0002
00
8
20
09
1,000,000
- ~2 million users a day - 100 million hits a day - 5 terabytes of data a day - 3,500 web hits a second (peak)
17
PubChem Users per Day
Current Activities/Future Plans
• Continued emphasis on:– Improving the input
• Tagging, standardization, explicit links (e.g., GenBank #s, NCT #s)
– Increasing data curation efficiency– Use of “influentials” to promote standards, best
practices– US Partnerships & International collaborations – Computer center efficiency, security– Better discovery, retrieval, display methods
21
0 times2%
1 time8%
2 times4%
3-5 times8%
6-10 times8%
11-100 times41%
100+ times28%
PubMed Central Article Request Frequency - cal-endar 2009
Available: 1.9 Million Articles Used: 98%, Used > 10 times: 69%