48
Reproducibility, Argument and Data in Translational Medicine © 2015 Massachusetts General Hospital Tim Clark, Ph.D. Assistant Professor of Neurology Massachusetts General Hospital & Harvard Medical School Massachusetts Alzheimer Disease Research Center seminar presentation at the Biostatistics Department, Harvard T.H. Chan School of Public Health, February 4, 2015

Reproducibility, argument and data in translational medicine

  • Upload
    twclark

  • View
    146

  • Download
    0

Embed Size (px)

Citation preview

Reproducibility, Argument and Data in Translational Medicine

© 2015 Massachusetts General Hospital

Tim Clark, Ph.D.Assistant Professor of Neurology

Massachusetts General Hospital & Harvard Medical SchoolMassachusetts Alzheimer Disease Research Center

seminar presentation at the Biostatistics Department,Harvard T.H. Chan School of Public Health,

February 4, 2015

“It has become apparent that an alarming number of published results cannot be reproduced by other people.

That is what caused John Ioannidis to write his now famous paper, Why Most Published Research Findings Are False [1].

That sounds very strong. But in some areas of science it is probably right.”

- David Colquhoun [2]

1. Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False, PLoS Med, 2, e124.2. Colquhoun, D. (2014) An investigation of the false discovery rate and the misinterpretation of p-values, Royal Society Open Science, 1.

Outline• The translation gap

• The false reported discovery rate

• Attrition in pharmaceutical pipelines

• Historical background on reproducibility

• Logical status of scientific articles

• Coping strategies at the ecosystem level

• The global argument graph

• Conclusions & postscript

T1

T2

Scannell et al. 2012. Nat Rev Drug Discov, 2012;11(3):191–200.

T2

• ~ 80% to 90% of top-tier academic research is non-reproducible in pharma target discovery labs.

• All phases in pharma discovery, development, preclinical and clinical have significant attrition.

• ~ 90% attrition in clinical trials has huge financial and social impact: risk avoidance.

T1

Hay et al.(2014) Nature Biotechnology 32,40–51.Begley and Ellis (2012) Nature, 483, 531-533.Prinz et al. (2011) Nat Rev Drug Discov, 10, 712.

Non-reproduciblity

11%

Begley CG and Ellis LM, Nature 2012, 483(7391):531-533

Obokata et al. 2014 Nature

On Obokata et al. 2014 from pubpeer.com &

imgur.com

[…]

http://imgur.com/1nBfKTr

• Obakata et al. received extraordinary scrutiny because of its surprising conclusions.

But what proportion of more “ordinary” papers receive this type of scrutiny?

It received further scrutiny because upon examination there turned out to be fraud.

What about non-fraudulent, but incorrect papers?

• Furthermore…

(1) It seems possible that Obokata’s fraudulent use of data came from her inability to reproduce the original Vacanti lab experiments in the RIKEN environment.

(2) We do not know whether the technique began with fraud at Harvard, or was simply “reproduced by fraud” when legitimate reproduction failed at RIKEN.

Colquhoun 2014• “Almost universal failure of biomedical papers to

appreciate what governs the false discovery rate.”

• “If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time.”

• “If, as is often the case, experiments are underpowered, you will be wrong most of the time.”

• “To keep your false discovery rate below 5%, you need to use a three-sigma rule, or to insist on p≤0.001.”

False discovery rate in diagnostic tests

• For disorder X, a test correctly diagnoses

• 95% of people without X as “false(X)” (specificity = .95) and

• 80% with X as “true(X)” (sensitivity = .80).

• Prevalence of X in the population = 1%

Diagnostic tests (contd.)

Colquhoun 2014, “An investigation of the false discovery rate and the misinterpretation of p-values”, Royal Society Open Science, 1.

False discovery rate: 86%

Drug screening• Assume drug candidates work in 10% of cases.

• Power = 0.8, sig level 0.05

• False discovery rate = 45/(45+80)=36%

False discovery rate: 36%

“We optimistically estimate the median statistical power of studies in the

neuroscience field to be between about 8% and about 31%.”

Button et al. 2013 Nature Reviews Neuroscience 14: 365-376

Underpowered

sensitivity=0.2 20% test positive(20 true pos tests)

80% test negative(80 false neg tests)

• False discovery rate = 45/(45+20)=69%

False discovery rate: 69%

Pharma attrition & productivity

attrition = 95.9%

$1.78 billion per new drug

Paul, S.M., et al. (2010) How to improve R&D productivity: the pharmaceutical industry's grand challenge, Nat Rev Drug Discov, 9, 203-214.

Pharma attrition & productivity

attrition = 95.9%

$1.78 billion per new drug

Paul, S.M., et al. (2010) How to improve R&D productivity: the pharmaceutical industry's grand challenge, Nat Rev Drug Discov, 9, 203-214.

target selection

?

“Improving the quality of target selection is the single most important factor to transform industry productivity and bring innovative new medicines to patients.”

Bunnage, M.E. (2011) Getting pharmaceutical R&D back on target, Nat Chem Biol, 7, 335-339.

Historical background

Reproducibility

“Virtual witnessing” for those not presentusing new the information technology ofthe scientific journal & the scientific article.

c. 1660: Robert Boyle and colleagues concerned with scientific vlidity of claims, e.g. “transformation of lead into gold”…

Scientific facts will now be establishedby reproducible demonstration before a “jury of one’s peers”.

adapted from [1] Steven Shapin 1984, Pump and Circumstance: Robert Boyle’s Literary Technology. Social Studies of Science 14(4):481-520

BOYLE: “We took a large and lusty frog and having included him in a small receiver we drew out the air not very much and left him very much swelled and able to move his throat from time to time - though not so fast as when he freely breathed before the exsuction (extraction) of the air. He continued alive about two hours that we took notice of, sometimes removing from one side of the receiver to the other, but he swelled more than before, and did not appear by any motion of his throat or thorax (chest) to exercise respiration. But his head was not very much swelled, nor his mouth forced open. After he had remained there somewhat above 3 hours, for it was not 3 hours and an half, perceiving noe signe of life in him, we let in the air upon him, at which the formerly tumid (swelled) body shrunk very much, but seemed not to have any other change wrought in it and though we took him out of the receiver yet in the free air it self, he continued to appear stark dead nevertheless to see the utmost of the experiment having caused him to be carried into a garden and layd upon the grass all night, the next morning we found him perfectly alive again.” (BP 18, fol. 127r)

adapted from Carusi 2015, “Virtual Witnessing”, in Future of Research Communications & eScholarship, Mathematical Institute, Oxford UK, 11-12 January 2015.

Definition: A scientific article is a

1. defeasible argument for claims; supported by 2. exhibited, reproducible data and methods,

and3. explicit references to other work in the

domain;4. described using domain-agreed technical

terminology.5. It exists in a complex ecosystem of

technologies, people and activities.

Logical status of a scientific article

16th Century: Phil. Trans. Royal Society v.1 (1665-6)

21st Century: J Immunology v.187 (2010)

Efforts to improve the ecosystem

• Mandatory open access

• Direct data citation & archiving

• Methods cataloging & ID

• Open annotation (W3C OA)

• Micro- & nano-publications μPub

• Reproducibility initiative

Joint Declaration of Data Citation Principles

endorsed by over 80 scholarly organizations

Direct deposition and citation of primary research data

“Micropublications” may be used to construct a graph of the

discussion and evidence including challenges.

Clark, Ciccarese & Goble: Micropublications: a Semantic Model of Claims, Evidence, Argument and Annotation for Biomedical Communication. Journal of Biomedical Semantics 2014 5:28 (http://www.jbiomedsem.com/content/5/1/28/abstract).

IPS: http://www.ebi.ac.uk/efo/EFO_0004905

Stem Cell: http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C12662

SemanticTags

http://purl.org/mp/mp:claim

http://purl.org/mp/mp:supportedBy

http://purl.org/mp/mp:data

Micropublication

Creating micropublications

Micropublication semantic summary

{:MP3 rdf:type mp:Micropublication; mp:name "MP(a3)"; mp:description "Digital summary of Spillman et al. 2010"; pav:authoredBy [ a foaf:Person ; foaf:name "Tim Clark" ]; pav:createdBy [ a foaf:Person ; foaf:name "Tim Clark" ]; pav:createdOn "2013-03-06T09:49:12-05:00"^^xsd:dateTime ; mp:argues :C3; mp:supportedBy <info:doi:10.1371/journal.pone.0009979> .} .

:MP3 = {:S1 rdf:type mp:Statement; mp:hasContent "Rapamycin [is] an inhibitor of the mTOR pathway." ; mp:supportedBy <info:doi/10.1038/nature08221> .:S2 rdf:type mp:Statement; mp:hasContent "PDAPP mice accumulate soluble and deposited Aβ and develop AD-like synaptic deficits as well as cognitive impairment and hippocampal atrophy." ; mp:supportedBy <info:doi/10.1073/pnas.96.6.3228> .

:S3 rdf:type mp:Statement; mp:hasContent "Rapamycin-fed transgenic PDAPP mice showed improved learning (Figure 1a) and memory (Figure 1b). We observed significant deficits in learning and memory in control-fed transgenic PDAPP animals." ; mp:supportedBy <http://www.jneurosci.org/content/20/11/4050> .

:M1 rdf:type mp:Procedure; mp:hasName "Rapamycin-supplemented mouse diet protocol" ; mp:hasContent "We fed a rapamycin-supplemented diet... or control chow to groups of PDAPP mice and littermate non-transgenic controls for 13 weeks. At the end of treatment (7 mo), learning and memory were tested using the Morris water maze." .

:M2 rdf:type mp:Material; mp:hasName "PDAPP J20"; mp:hasDescription "Lennart Mucke's PDAPP J20 transgenic mice, as obtained from JAX, stock#006293" ; mp:describedBy: <http://jaxmice.jax.org/strain/006293.html> .

:D1 rdf:type mp:Data; pav:retrievedFrom <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0009979#pone-0009979-g001>; mp:supportedBy :M1, :M2 .

:C3 rdf:type mp:Claim; mp:hasContent "Inhibition of mTOR by rapamycin can slow or block AD progression in a transgenic mouse model of the disease." ; mp:supportedBy :S1, :S2, :S3, :D1.} .

Navigable claim-evidence networks

Figure from Greenberg SA, British Medical Journal 2009, 339:b2680

Micro-pubs + Logical formalisms

coming soon: Open BEL…(Biological Expression Language)

W3C Open Annotation Model

<body1> a cnt:ContentAsText, dctypes:Text ; cnt:chars "content" ; dc:format "text/plain" .

<target1> dc:format “application/pdf”

<anno1> a oa:Annotation ; oa:hasBody <body1> ; oa:hasTarget <target1> .

RDF

Micropublication of Obakata’soriginal claims & data

Micropublication of discussion from PubPeer & Riken

But is this really such a great idea?

Does failure to reproduce invalidate the original experi-

ment, or the reproduction experiment?

Transparency vs. Reproducibility

• Require significant effort to achieve progress but transparency is more pragmatic.

• Transparency should naturally lead to more rapid correction/validation/responsibility.

• Open licenses will facilitate assessment of reproducibility in transparent content.

• Innovation and standardization needed in filtering and identification of most reproducible works.

42

adapted with thanks, from a talk by Iain Hrynaszekwicz, Nature Publishing Group, on “Transparency vs. Reproducibility”, Mathematical Institute, Oxford UK, Jan. 11, 2015

Should Scholarly Research Aim for Reproducibility or Robustness?

Reproducibility: The ability of an entire experiment or study to be reproduced, ideally according to the same reproducible experimental description and procedure

Robustness: A characteristic describing a phenomenon / finding to be detected effectively while the variables of a test system are altered

A robust concept can be observed without failure under a variety of conditions

A robust finding may be (biologically) more relevant than reproducibility

⇨ Robustness of data may be key

adapted with thanks, from a talk by Thomas Steckler, Janssen Pharmaceuticals, on “Reproducibility vs. Robustness”, Mathematical Institute, Oxford UK, Jan. 11, 2015

Conclusions• False reported discovery rate (FRDR) is a

systemic problem in biomedical research and communication.

• FRDR drives up pharmaceutical attrition, cost of health care; negatively impacts translation T1-T4.

• There are statistical, ethical, informatics and social components to scientific reproducibility - all of which need to be addressed.

Postscript

Ernest Rutherford: “All science is either physics

or stamp collecting.”

Paraphrase: Physics is the best and most rigorous of all scientific enterprises, i.e., the

“gold standard”.

Historical values of the speed of light

• pre-17th century: ∞ (instantaneous)

• 1638 Galileo: at least 10 times faster than sound

• 1675 Ole Roemer: 200,000 Km/sec

• 1728 James Bradley: 301,000 Km/sec

• 1849 Hippolyte Louis Fizeau: 313,300 Km/s

• 1862 Leon Foucault 299,796 Km/s

• Today: 299,792.458 km/s

Acknowledgements• Sudeshna Das

• Paolo Ciccarese

• Emily Merrill

• Stian Soiland-Reyes

• Carole Goble

• Maryann Martone

• Annamaria Carusi

• Iain Hrynaskiewicz

• Thomas Steckler

• Brad Hyman