21
Board on Research Data and Information, National Research Council “Changing Roles of Libraries in Support of Scientific Data Activities” June 3, 2010 More Data, More Use, Less Lead Time: Scientific Data Activities at the National Library of Medicine Betsy L. Humphreys Deputy Director National Library of Medicine www.nlm.nih.gov

Betsy L. Humphreys Deputy Director National Library of Medicine nlm.nih

  • Upload
    lok

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

Board on Research Data and Information, National Research Council “Changing Roles of Libraries in Support of Scientific Data Activities” June 3, 2010 More Data, More Use, Less Lead Time: Scientific Data Activities at the National Library of Medicine. Betsy L. Humphreys Deputy Director - PowerPoint PPT Presentation

Citation preview

Page 1: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Board on Research Data and Information, National Research Council“Changing Roles of Libraries in Support of Scientific Data Activities”

June 3, 2010

More Data, More Use, Less Lead Time:Scientific Data Activities at the

National Library of Medicine

Betsy L. HumphreysDeputy Director

National Library of Medicinewww.nlm.nih.gov

Page 2: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

NLM & Scientific Data

• Data categories– Substances– Sequences– Clinical Research– Taxonomies/Nomenclatures/Ontologies

Page 3: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

NLM & Scientific Data

• Challenges (aka Problems)– Much more data

• Greater NIH/other investment in generating data• High throughput methods• New, unfunded mandate(s)

– Much less lead time• Need to achieve standardization more rapidly

Page 4: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 5: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Growth In PubChem Tested Substances

Page 6: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 7: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

start to

12

/31

/20

05

1/3

0/2

00

53

/6/2

00

54

/10

/20

05

5/1

5/2

00

56

/19

/20

05

7/2

4/2

00

58

/28

/20

05

10

/2/2

00

51

1/6

/20

05

12

/11

/20

05

1/1

5/2

00

62

/19

/20

06

3/2

6/2

00

64

/30

/20

06

6/4

/20

06

7/9

/20

06

8/1

3/2

00

69

/17

/20

06

10

/22

/20

06

11

/26

/20

06

12

/31

/20

06

2/4

/20

07

3/1

1/2

00

74

/15

/20

07

5/2

0/2

00

76

/24

/20

07

7/2

9/2

00

79

/2/2

00

71

0/7

/20

07

11

/11

/20

07

12

/16

/20

07

1/2

0/2

00

82

/24

/20

08

3/3

0/2

00

85

/4/2

00

86

/8/2

00

87

/13

/20

08

8/1

7/2

00

89

/21

/20

08

10

/26

/20

08

11

/30

/20

08

1/4

/20

09

2/8

/20

09

3/1

5/2

00

94

/19

/20

09

5/2

4/2

00

96

/28

/20

09

8/2

/20

09

9/6

/20

09

10

/11

/20

09

11

/15

/20

09

12

/20

/20

09

1/2

4/2

01

0

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

100,000

Week of

R

ec

ord

s

7

ICMJE

FDAAA 801

~25-30 / wk

~250 / wk

~320 / wk

Number of Studies Registered at ClinicalTrials.gov since May 1, 2005

2,317 Results Records submitted (Sept 2008 – March 2010)– About 30 new results records per week; 80 re-submissions per week

– Anticipate increase in rate as rules become clear and outreach continues

Page 8: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 9: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

UMLS Metathesaurus – May 2010 version

Page 10: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

NLM & Scientific Data• Strengths

– Mission & Track Record• Curation, Storage, Permanent Access, Standards, R & D

– Robust Infrastructure• Staff Expertise, Advisory Structure, Computing, Communications

– Connections between different kinds of data, information– Strong US partnerships and international collaborations– Heavy use

• Weaknesses– The “defects of our qualities”– Limited resources– Less user outreach/training than desirable

Page 11: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 12: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Hazardous Substances Data, 1978-

Page 13: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Toxic Release Inventory Data, 1987-

Page 14: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

National Center for Biotechnology Information, 1988-

– Design, develop, implement, and manage automated systems for collection, storage, retrieval, analysis, & dissemination of knowledge concerning molecular biology, biochemistry, & genetics

– Perform research into advanced methods of computer-based information processing capable of representing and analyzing the vast number of biologically important molecules and compounds

– Enable persons engaged in biotechnology research and medical care to use these systems & methods

– Coordinate, as much as is practicable, efforts to gather biotechnology information on an international basis

Page 15: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Benzene – PubChem Bioassay Results

Page 16: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 17: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

300,000

200,000

100,000

Entrez Web Traffic (Unique IP Addresses): 1999 - 2009

400,000

19

98

500,000

600,000

700,000

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

800,000

20

07

900,0002

00

8

20

09

1,000,000

- ~2 million users a day - 100 million hits a day - 5 terabytes of data a day - 3,500 web hits a second (peak)

17

Page 18: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

PubChem Users per Day

Page 19: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

Current Activities/Future Plans

• Continued emphasis on:– Improving the input

• Tagging, standardization, explicit links (e.g., GenBank #s, NCT #s)

– Increasing data curation efficiency– Use of “influentials” to promote standards, best

practices– US Partnerships & International collaborations – Computer center efficiency, security– Better discovery, retrieval, display methods

Page 20: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih
Page 21: Betsy L. Humphreys Deputy Director  National Library of Medicine nlm.nih

21

0 times2%

1 time8%

2 times4%

3-5 times8%

6-10 times8%

11-100 times41%

100+ times28%

PubMed Central Article Request Frequency - cal-endar 2009

Available: 1.9 Million Articles Used: 98%, Used > 10 times: 69%