Pharmacoepidemiology – Big data, Big problems, Big - agenda • In the context of pharmacoepidemiology • What are big data? • What are the big problems with big data? • Are

  • View

  • Download

Embed Size (px)

Text of Pharmacoepidemiology – Big data, Big problems, Big - agenda • In the context of...

  • James Brophy MD FRCP PhD McGill University Health Center,

    McGill University, Montreal, Quebec

    Rseau Qubcois de Recherche sur les Mdicaments

    Session II : Big Data : une mine dor Qubcoise exploiter

    1 juin 2015

    Pharmacoepidemiology Big data, Big problems, Big solutions

  • 2

    Conflicts of Interest

    I have no known conflicts associated with this presentation and to, the best of my knowledge,

    am equally disliked by all pharmaceutical and device companies

  • Outline - agenda

    In the context of pharmacoepidemiology What are big data? What are the big problems with big data? Are there innovative solutions to these



  • What is the definition of big data? Something that

    doesnt fit into Excel (65,535 row limit) makes you say wow makes you uncomfortable working with it only applies to genomics

    Wikipedia Big data is high volume, high velocity, and/or

    high variety information to enable enhanced decision making, insight discovery and process optimization. 4

  • How big is big data?


  • Just because its big, is it right?


    Over 6 million Americans have reached the age of 112 Just 13 are claiming benefits, and 67,000 of them are WORKING

  • More big data hubris

    1. 2008 stock market crash lots of economic data but incorrect models failed to predict and even facilitated the crash (Black Swan N. Taleb)

    2. Google - we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. (Nature 2009)


  • More big data hubris

    Google Flu was wrong for 100 out of 108 weeks since August 2011

    Error was a systematic over-estimate (Science Mar 14 2014) 8

  • So the big question

    Is not the volume, velocity or variety of the data that is the problem but rather its VERACITY

    Also a problem for pharmacoepidemiology?


  • Pharmacoepidemiology 2010


  • 2010 Both studies used UK GPRD database

    1996 -2006 & 1995-2005


    BMJ RR 2

    JAMA RR 1.07

  • Me, too


  • 2 RAMQ cohorts


  • NEJM RCT 2014


  • NEJM RCT 2014


  • Problems with Big Data

    Most big data is observational -> biases (selection, information) and confounding

    Big data -> small random errors, tight CIs, small p values, but systematic errors not measured in these CIs -> false sense of precision

    Big data often leads to ignoring other pertinent evidence that should be synthesized to reach the most reasonable conclusions


  • Principles for working with big data Government

    Privacy / Accessibility Integrity of the data

    Researchers Privacy / security Processing the data (design, analysis, model

    selection) Interpreting the results - epistemologically

    important to distinguish information (data), knowledge (causal inferences) & wisdom (systematic incorporation of all knowledge)


  • Learning from Big Data More than big data need better data, rich in

    important confounders Need better research designs, especially

    experimental data Need to better appreciation of the quantitative

    sciences (uncertainty, causal inference) Need domain knowledgespecific clinical

    information Must incorporate prior evidence.

    If good prior data use informative priors If very little data use agnostic/uniform prior beliefs 18

  • What is the purpose of pharmacoepidemiology?

    Patterns of drug utilization Generating new information on drug safety Supplementing premarketing effectiveness

    studies different populations, better precision

    However, the overall purpose is to provide insights or causal inferences, not merely associations generated from large data sets.


  • Estimating causal effects

    1. Randomized Experiments 2. Natural Experiments 3. Instrumental Variables 4. Regression Discontinuity 5. Difference in Differences


  • 21

    An example

  • Results


  • Problems

    Not sure of the benefit in NA context Changing everyone in Quebec to

    ticagrelor would cost $25 million Doing a large conventional RCT could cost

    $10-50 million What to do?


  • Using big data effectively

    Most of the cost is for the follow-up We have excellent administrative

    databases with reliable measures of death and cardiac outcomes so could minimize costs

    Need to avoid selection basis so could randomize at start and then simply observe

    New design randomized registry can answer the question at a reasonable cost aa24

  • Conclusion Instead of focusing on a big data

    revolution, better is an all data revolution including replication

    Recognize critical change has been innovative designs and analytics, can be applied to both traditional and new data

    Big data is an aid to thinking not a substitute for thinking

    Goal of this revolution is to provide a deeper, clearer understanding of our world. 25 Science March 14 2014

  • Merci


  • Learning form big data Must incorporate prior evidence.

    If good prior data use informative priors If very little data use agnostic or uniform prior

    beliefs In all cases, must be able to specify where

    you are and why, if agnostic approach then need validation study

    Avoid confusing prior beliefs with prior evidence -> biases


  • How Much Data is There?

    2.5 quintillion terabytes of data were generated every day in 2012

    As much data is now generated in just two days as was created from the dawn of civilization until 2003.

    28 Harvard Business Review Dec


  • Where things go wrong is where tools of this kind are used not as an aid to thinking but as a substitute for thinking. When the information provided is used (this was one of David Ogilvys favourite quotations) as a drunk uses a lamppost: for support rather than illumination.


  • What can big data find in healthcare?


  • Big data & inferences

    31 Washington Post March 21

  • What is the correct inference?

    Americans spend too much on gambling and too much on the important stuff of politics

    Americans spend too much on gambling and not enough on the important stuff of politics

    Americans dont spend too much on gambling but spending on politics is out of control


  • Looking in detail

    Consider there are 316 MM Americans Basketball 13% gambled, average bet $200 Elections, 80% adults, average $25 Elections 1% of 1% of the population

    (31,600) spent 28% or $2 B, average contribution $64,000

    Very small sample of Americans are controlling the election process


  • How unequal?


  • Do statins increase or decrease the risk of cancer?

    Impossible d'afficher l'image. Votre ordinateur manque peut-tre de mmoire pour ouvrir l'image ou l'image est endommage. Redmarrez l'ordinateur, puis ouvrez nouveau le fichier. Si le x rouge est toujours affich, vous devrez peut-tre supprimer l'image avant de la rinsrer.



  • Maybe neither

    Maybe this is an isolated case and dates from 2007. Surely we are better today.

  • Do statins cause diabetes?


  • Do statins cause diabetes?


  • Statins & diabetes, Who do you believe?

    Both studies published in May 2013 Both studies published in high impact

    journals Both used validated administrative

    datasets Both published by renown investigators


  • Statins & diabetes, Who do you believe?

    Even more confusing & troublesome Both used THE SAME validated

    administrative datasets (Ontario) Both used essentiallyTHE SAME patients

    (>65, no diabetes, new statin users from 1997 (2004) - 2010

    Both sets of authors are from THE SAME academic institution (Sunnybrook, U of T)


  • Adaptive randomization & ethics


  • In the end, it seems doubtful that adaptive allocation generally improves risk/benefit for patients.

    Require larger sample sizes -> more patients, more research procedures, more visits.

    Since costs scale with sample sizes, it means more resources are consumed in answering a single research question than with a fixed 1:1 design.


  • Adaptive randomization & ethics

    Does outcome-adaptive allocation better accommodate clinical equipoise and promotes informed consent?

    Does adaptive allocation offers a partial remedy for the therapeutic misconception associated with fixed randomization?


  • Arguing against

    Hey and Kimmelman suggest that they do not improve riskbenefit for subjects but increase total burden for both patients and research systems by demanding larger sample sizes.

    Suggest that they redistribute rather than dissolving tensions in informed consent

    Suggest may have validity problems 44

  • A source of bias? Given that the odds of receiving the better

    treatment will improve over the course of the trial

    It is in the best interests of