35
Chief Science Officer, Epidemico, Inc. Presented at ISOP 2015 Prague October 29, 2015 01 Slides available @epidemico 02 Use of Social Media for Data Mining in Pharmacovigilance Nabarun Dasgupta, MPH, PhD

Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

  • Upload
    doanthu

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Chief Science Officer, Epidemico, Inc.

Presented at ISOP 2015 Prague

October 29, 2015

01

Slides available @epidemico

02

Use of Social Media for Data Mining in Pharmacovigilance Nabarun Dasgupta, MPH, PhD

Page 2: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

DisclosuresEpidemico is commercializing aspects of this research. Epidemico is wholly-owned subsidiary of Booz Allen Hamilton.

FDA has been funding research technology for 3 years.

Office of the Chief Scientist (OCS)Center for Tobacco Products (CTP)

Office of International Programs (OIP)

US FDA 01

A public-private partnership entering Year 2. The WEB-RADR project is supported by theInnovative Medicines Initiative Joint Undertaking (IMI JU) under grant agreement n° 115632,resources of which are composed of financial contributions from the European Union's SeventhFramework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.

EUROPE - IMI

02

GSK has been a development partner to extendtechnology and obtain GxP software validation,

over the last 2 years.

INDUSTRY

03

Page 3: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Background Why bother looking at social media?

01

Page 4: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Infographic elements Slide sub title

The  Drug  I’m  Taking  The  Drug  I’m  Responsible  for  Monitoring  

Page 5: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Methods How to tame social media

02

Page 6: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Data Processing Overview

Automated processes to identifyrelevant posts and clean data

02. Filter

Automated statistics and visual display serve hypothesis generation. HOWEVER, determining causation requiresinformation from other sources.

04. Statistics… and Determining Causation

OPTIONAL: remove false positives,disentangle attributions,

expand dictionary, correct geography

03. Curate

Collect social media data from APIs, third-partyauthorized resellers, and automated scraping

01. Acquire

Page 7: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Automated ProcessesFor making sense of social media data

Use post geodata orprofile location

(~50% of Proto-AEs)

GeotagInternet vernacular to MedDRA

preferred terms

Translate Vernacular

Reduce multiplicate copiesRemove identifiers

Consolidate

Automated NLPBayesian classifierMachine learning

Identify Events

Page 8: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Proto-AEPosts with Resemblance to Adverse Events

Dizziness MedDRA: 10013573

ICD-10: R53

Asthenia MedDRA: 10003549

ICD-10: R42

For more examples see demo.medwatcher.org

Page 9: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Spelling  Variations  Combined  or  invented  words  Implied  phrases  Internet  Vernacular  Translation  

lost  their  eyesight  

seeing  weird  color  seeing  weird  colour  

doublevision  

couldn’t  see  double  vision  

blind  googley  eyed  

blurry  vision  

changes  in  vision  cross  eyed  seeing  weird  

vision  change  blindness  

cross  vision  visual  snow  

googly  eyed  seeing  double  

making  me  eat  like  a  mouse  anorexic  

lost  appetite  

#notevenhungry  appetite  is  nonexistent  

apetite  surpressed  

didn’t  get  hungry  

dont  want  to  eat  

killed  my  apetite  

miss  feeling  hungry  

killed  my  appetite  can’t  eat  lost  apetite  

lost  my  appetite  no  appetitey  

lack  of  apetite  stomach  small  

lost  teh  appetite  

never  hungry  never  want  to  eat  

cant  eat  

couldnt  see  crosseyed  

blurry  

Visual  impairment  MedDRA    10047571    

Visual  impairment  SNOMED  397540003  

Decreased  appetite  MedDRA    10061428  

Loss  of  appetite  SNOMED  79890006  

apetite  surpressed  

killed  my  apetite  lost  apetite  

lack  of  apetite  

lost  teh  appetite  

Typos  

seeing  weird  color  seeing  weird  colour  

googley  eyed  

googly  eyed  

#notevenhungry  

no  appetitey  

making  me  eat  like  a  mouse  

never  want  to  eat  

Page 10: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Visual  impairment  MedDRA  10047571    

Visual  impairment  SNOMED  397540003  

Decreased  appetite  MedDRA  10061428    

Loss  of  appetite  SNOMED  79890006  

making  me  eat  like  a  mouse  anorexic  

lost  appetite  

#notevenhungry  appetite  is  nonexistent  

apetite  surpressed  

didn’t  get  hungry  

dont  want  to  eat  

miss  feeling  hungry  

killed  my  appetite  can’t  eat  lost  apetite  

lost  my  appetite  no  appetitey  

lack  of  apetite  stomach  small  

lost  teh  appetite  

never  hungry  never  want  to  eat  

cant  eat  

lost  their  eyesight  

seeing  weird  color  seeing  weird  colour  

doublevision  

couldn’t  see  double  vision  

blind  googley  eyed  

blurry  vision  

changes  in  vision  cross  eyed  seeing  weird  

vision  change  blindness  

cross  vision  visual  snow  

googly  eyed  seeing  double  

couldnt  see  crosseyed  

blurry  

killed  my  apetite  

Page 11: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

System Statistics

Drugs, medical devices,vaccines, biologics, tobaccoproducts, supply chain

Regulated products2,500

Updated daily by humancurators. Mandarin, French,Spanish & Dutch next.

Vernacular English phrases10,000

Public posts from Facebook, Twitter & patient forums, back to 2012

Posts collected63,000,000

Certified coders train machinelearning classifiers

& build dictionary

Manually classified380,000

As of October 23, 2015

Page 12: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

System PerformanceHow well does automated processing work?

Automated tools identify 9 out of 10 adverse events.

Sensitivity or Recall

7 out of 10 posts contain AEinformation. Can increase to100% with manual curation.May vary by product subset.

Positive Predictive Valueor Precision68%

88%

Across all products, all time, all data sources

Page 13: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

SC

RA

NTO

N B

, DE V

RIE

S J

, BE

RIN

ATO

S, B

OR

TZ C

. HA

RVA

RD

BU

SIN

ES

S R

EV

IEW

, JU

NE 2

013

Page 14: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

How to build a machine learning classifier Congratulations! You just trained the classifier.

Proto-AE Trash

Medicine Prozak® (Fluoxetine) for the treatment of bulimia nervosa . Buy now Prozac-20mg "Fluoxetine" ...

Benefit

81%27%74%4%2%75%92%

You are all MedDRA Certified Coders, right?

Instant Feedback: % Confidence that this is a Proto-AE

Page 15: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Indicator score calculation How well does automated processing work?

Albuterol has me feeling all sorts of dizzy and weak this morning

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Proportion out of all keeps

Keep

Proportion of posts token never shows up in

None

Proportion of all trash

Trash

albuterol albuterol has albuterol has me has me has me feeling feeling feeling all feeling all sorts sorts sorts of dizzy dizzy dizzy and weak weak weak this weak this morning morning morning

1. Tokenization 2. Score calculation

For each token “w”

b(w) = (number of positives containing w) (total number of positives)

g(w) = (number of negatives containing w)

(total number of negatives)

Page 16: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

One Applied Example albuterol or salbutamolEstablished oral and inhaled drug for asthma & COPDHighly genericized marketA priori selected demonstration drug at demo.medwatcher.org

03

Page 17: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

MethodsAlbuterol/salbutamol has been a demo drug from project inception due to client interest

English language onlyUse 0.7 indicator score thresholdMultiplicate copies consolidated

02. Proto-AEs

Tree plots, top events, PRRComparison to deduplicated UK spontaneous reportsDrug Analysis Prints form MHRA

04. Statistics

Need to check for false positives03. No curation

1 year ending October 23, 2015albuterol, salbutamol, brand names

01. Public Facebook and Twitter

Page 18: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Spam HappensExclude n=297 Twitter posts mentioning Ventolin, having a link, 19-Mar-2015 to 02-Apr-2015

Page 19: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Albuterol/Salbutamol: Proto-AE vs. Mentions

Mentions represent the patient voice.They exclude most spam, news, coupons,

and corporate announcements.

3% of Mentions were Proto-AEs

2.5%

3.1%3.2%

Unique posts from Twitter & Facebookn = 1,946 Proto-AEs

21 out of 26 System Organ Classes81% MedDRA SOCs

Posts can have multiple symptoms andproducts mentioned within same posts

2,987 Drug-Event Pairs

1,613 from Twitter & 333 from Facebook83% From Twitter

Higher level group terms82 HLGTs

Higher level terms125 HLTs

MedDRA v18199 Preferred Terms

24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=63,284 Mentions

Page 20: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Albuterol/Salbutamol by System Organ Class24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs Rectangle size = Proto-AE count | Color intensity = PRR

Cardiac disorders Ear and labyrinth disorders

Eye disorders

Gastrointestinal disorders

General disorders and administration site conditions

Immune system disorders

Infections and infestations

Injury, poisoning and procedural complications Investigations

Metabolism and nutrition disorders

Musculoskeletal and connective tissue disorders

Neoplasms benign, malignant and unspecified (incl cysts and polyps)

Nervous system disorders

Psychiatric disorders

Respiratory, thoracic and mediastinal disorders Skin and subcutaneous tissue disorders

Social circumstances

Surgical and medical procedures

Vascular disorders

Page 21: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Albuterol/Salbutamol by Preferred Term 24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs Rectangle size = Proto-AE count | Color intensity = PRR | Black = small sample PRR restriction

Palpitations

Drug ineffective

Pain Malaise

Incorrect dose administered

Hypersensitivity

Insomnia

Altered state of consciousness Feeling jittery

Tremor

Restlessness

Cough

Lung disorder

Anxiety Crying Somnolence

Wheezing

Dizziness

Nonspecific reaction

Fatigue Condition aggravated

Feeling abnormal

Heart rate increased

Drug dose omission

Bronchitis

Headache

Nausea Vomiting Psychomotor hyperactivity

Burning sensation

Affect lability Dependence

Pyrexia

Pneumonia

Irritability Muscle twitching

Page 22: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Albuterol/Salbutamol – Top 5 by Count24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs

Public Facebook and Twitter combined.

123

136

175

204

433

0 50 100 150 200 250 300 350 400 450 500

Tremor

Drug ineffective

Altered state of consciousness

Malaise

Cough

Represents the things that most concern patients

Page 23: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Albuterol/Salbutamol – Top 5 by PRR24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs

Unadjusted PRRs calculated against all other products in social listening database (n=2,551 medical products), 5 or more events

23.6

26.6

31.1

32.5

35.7

0 5 10 15 20 25 30 35 40

Tremor

Palpitations

Feeling jittery

Tachycardia

Wheezing

Corresponds well with product label

Page 24: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Correlation: Spontaneous Report vs. Social Media Non-parametric density (isometric plot) at HLGT level

Respiratory disorders NEC (cough, lung disorder, oropharyngeal pain, rhinorrhoea, chest pain, dyspnoea)

Movement disorders (tremor)

Therapeutic/non-therapeutic effects (drug ineffective)

General system disorders (malaise, feeling jittery, pain, condition aggravated,

fatigue, feeling abnormal, asthenia)

Sleep disorders (insomnia, somnolence)

Epidermal & dermal conditions (pruritus, skin discomfort, rash, angioedema, urticaria)

Spontaneous 07-Apr-1969 to 26-Aug-2015

46.5 years 2,199 unique non-fatal reports

n=3,851 reactions at HLGT 156 unique HLGT

Social 24-Oct-2014 to 23-Oct-2015 1 year 1,946 unique non-fatal reports N=2,419 reactions at HLGT 57 unique HLGT

Concordance More common in spontaneous data More common in social media data

Page 25: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Social Media in Practice Projects already underway for safety surveillance

04

Page 26: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Social Data in Concert with SpontaneousRectifying a false negative in an sponsor’s safety database

“Did this product cause x, y, z? I’ve had 3 administrations of this product andever since then have felt a, b, c.”

Twitter Post Identified

Assessment of causality and importance led the sponsor to initiate asubmission with possible label change

Submit to Regulators

Social media monitoring usingEpidemico platform

Routine Portfolio Surveillance

A couple of cases had been previouslyreported to the sponsor but

had not crossed the signal threshold.

Review Company’s Safety Database

Page 27: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Monitoring Discussions about Clinical TrialsProtecting validity and blinding

DiscontinuationI'm in a [Sponsor] double blind #### trial andcould receive the placebo up until week 16 at

which point I'm guaranteed to receive the drug. What I don't understand is that I can't be told if I w

as on the placebo for the first 16 weeks.I'm not seeing any positive benefit from whatever

I've been taking for 8 weeks so far, and I wouldreally like to know at week 16 if it's the drug

because then I can bail from the study at thattime and move on to something else…

Unblinding[My brother] started on Thursday. He told me yesterday that he tried to chew a capsule(I'm not sure why, I think he wanted to see if it had a flavor) & it was hard as a rock. I’massuming maybe it was eneric [sic] coated orsomething. He took the no flavor to meanhe's on the placebo, but I actually wonder ifthey would bother enteric coating a placebo.Probably doesn't matter & I know we're notsupposed to know anyway, just wondered ifanyone else's was enteric coated.

Page 28: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Beyond PharmacovigilanceHospital sentiment, counterfeits

Page 29: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Collaboration with WHO-UMC & MedDRA MSSO

Identify Products, Translate Vernacular to MedDRA

01

Reduce noise, provide indicator scoresRemove Spam, Identify AE , Benefits, Sentiment

02

Demographics, time to onset, dose, route,medical conditions, lab results, challenge

Entity Extraction for Causal Determination03

Names, hospital name, medical record #, phone, emailAnonymize by Removing PII

04

Coming 1Q2016: Free Public API In appreciation for the public investment in this tool (FDA, IMI & Web-RADR)

For social media, clinical trial patient diaries, case safety reports, scientific literature

Page 30: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Limitations What does “bias” mean in this setting?

05

Page 31: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

How well can patients assess causality? How much should we care?Is a solely quantitative approach justified?

Causality & Quantitative Emphasis

01

Are we willing to disenfranchise the patient voice on privacy grounds?Patient Privacy vs. Muting the Patient Voice

02

How do you build tools for an ever-shifting data environment?Stability of Data Availability & Emerging Arenas

03

Can we all agree that nobody wants to see each Tweet as a separatecase report in a safety database? So, why is everyone so afraid?

Regulation Unclear

04

Challenges A few among many

Page 32: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Are social media data biased?YES! All data are biased. But, can we understand the bias and make use of the information?

Page 33: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

What we know, how we know itEach source of product safety information provides different perspectives on patient experience

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Clinical Trials Sponsor AE Practice Data Social Media

PRO Seriousness Completeness

Social media can elucidate whatpatients experience most frequently.

1. Patient-Reported Outcomes

Traditional PV focus on most serious of eventsmay be less central in social media.

2. Seriousness

Social media posts may by less complete.Conversely, contain less sensitive identifying

information than electronic health records.

3. Completeness

Three Dimensions of Bias

Hypothetical data for illustrative purposes only.

Page 34: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Carrie  Mike   Aranka   Ciara   Nabarun   Wenjie  

Harold   John   Robin   Chi   Sue  

Kelly   Kate   Clark   Chris   Robin  

Derek  

Matt  

Essam  

Melissa  

Kyra  

Nico  

Lou  

Carly  

Lucas  

Page 35: Use of Social Media for Data Mining in Pharmacovigilance · for Data Mining in Pharmacovigilance ... under grant agreement n° 115632, ... 0 50 100 150 200 250 300 350 400 450 500

Thank you [email protected]

@epidemico@WEBRADR

demo.medwatcher.org