Upload
doanthu
View
225
Download
0
Embed Size (px)
Citation preview
Chief Science Officer, Epidemico, Inc.
Presented at ISOP 2015 Prague
October 29, 2015
01
Slides available @epidemico
02
Use of Social Media for Data Mining in Pharmacovigilance Nabarun Dasgupta, MPH, PhD
DisclosuresEpidemico is commercializing aspects of this research. Epidemico is wholly-owned subsidiary of Booz Allen Hamilton.
FDA has been funding research technology for 3 years.
Office of the Chief Scientist (OCS)Center for Tobacco Products (CTP)
Office of International Programs (OIP)
US FDA 01
A public-private partnership entering Year 2. The WEB-RADR project is supported by theInnovative Medicines Initiative Joint Undertaking (IMI JU) under grant agreement n° 115632,resources of which are composed of financial contributions from the European Union's SeventhFramework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.
EUROPE - IMI
02
GSK has been a development partner to extendtechnology and obtain GxP software validation,
over the last 2 years.
INDUSTRY
03
Background Why bother looking at social media?
01
Infographic elements Slide sub title
The Drug I’m Taking The Drug I’m Responsible for Monitoring
Methods How to tame social media
02
Data Processing Overview
Automated processes to identifyrelevant posts and clean data
02. Filter
Automated statistics and visual display serve hypothesis generation. HOWEVER, determining causation requiresinformation from other sources.
04. Statistics… and Determining Causation
OPTIONAL: remove false positives,disentangle attributions,
expand dictionary, correct geography
03. Curate
Collect social media data from APIs, third-partyauthorized resellers, and automated scraping
01. Acquire
Automated ProcessesFor making sense of social media data
Use post geodata orprofile location
(~50% of Proto-AEs)
GeotagInternet vernacular to MedDRA
preferred terms
Translate Vernacular
Reduce multiplicate copiesRemove identifiers
Consolidate
Automated NLPBayesian classifierMachine learning
Identify Events
Proto-AEPosts with Resemblance to Adverse Events
Dizziness MedDRA: 10013573
ICD-10: R53
Asthenia MedDRA: 10003549
ICD-10: R42
For more examples see demo.medwatcher.org
Spelling Variations Combined or invented words Implied phrases Internet Vernacular Translation
lost their eyesight
seeing weird color seeing weird colour
doublevision
couldn’t see double vision
blind googley eyed
blurry vision
changes in vision cross eyed seeing weird
vision change blindness
cross vision visual snow
googly eyed seeing double
making me eat like a mouse anorexic
lost appetite
#notevenhungry appetite is nonexistent
apetite surpressed
didn’t get hungry
dont want to eat
killed my apetite
miss feeling hungry
killed my appetite can’t eat lost apetite
lost my appetite no appetitey
lack of apetite stomach small
lost teh appetite
never hungry never want to eat
cant eat
couldnt see crosseyed
blurry
Visual impairment MedDRA 10047571
Visual impairment SNOMED 397540003
Decreased appetite MedDRA 10061428
Loss of appetite SNOMED 79890006
apetite surpressed
killed my apetite lost apetite
lack of apetite
lost teh appetite
Typos
seeing weird color seeing weird colour
googley eyed
googly eyed
#notevenhungry
no appetitey
making me eat like a mouse
never want to eat
Visual impairment MedDRA 10047571
Visual impairment SNOMED 397540003
Decreased appetite MedDRA 10061428
Loss of appetite SNOMED 79890006
making me eat like a mouse anorexic
lost appetite
#notevenhungry appetite is nonexistent
apetite surpressed
didn’t get hungry
dont want to eat
miss feeling hungry
killed my appetite can’t eat lost apetite
lost my appetite no appetitey
lack of apetite stomach small
lost teh appetite
never hungry never want to eat
cant eat
lost their eyesight
seeing weird color seeing weird colour
doublevision
couldn’t see double vision
blind googley eyed
blurry vision
changes in vision cross eyed seeing weird
vision change blindness
cross vision visual snow
googly eyed seeing double
couldnt see crosseyed
blurry
killed my apetite
System Statistics
Drugs, medical devices,vaccines, biologics, tobaccoproducts, supply chain
Regulated products2,500
Updated daily by humancurators. Mandarin, French,Spanish & Dutch next.
Vernacular English phrases10,000
Public posts from Facebook, Twitter & patient forums, back to 2012
Posts collected63,000,000
Certified coders train machinelearning classifiers
& build dictionary
Manually classified380,000
As of October 23, 2015
System PerformanceHow well does automated processing work?
Automated tools identify 9 out of 10 adverse events.
Sensitivity or Recall
7 out of 10 posts contain AEinformation. Can increase to100% with manual curation.May vary by product subset.
Positive Predictive Valueor Precision68%
88%
Across all products, all time, all data sources
SC
RA
NTO
N B
, DE V
RIE
S J
, BE
RIN
ATO
S, B
OR
TZ C
. HA
RVA
RD
BU
SIN
ES
S R
EV
IEW
, JU
NE 2
013
How to build a machine learning classifier Congratulations! You just trained the classifier.
Proto-AE Trash
Medicine Prozak® (Fluoxetine) for the treatment of bulimia nervosa . Buy now Prozac-20mg "Fluoxetine" ...
Benefit
81%27%74%4%2%75%92%
You are all MedDRA Certified Coders, right?
Instant Feedback: % Confidence that this is a Proto-AE
Indicator score calculation How well does automated processing work?
Albuterol has me feeling all sorts of dizzy and weak this morning
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Proportion out of all keeps
Keep
Proportion of posts token never shows up in
None
Proportion of all trash
Trash
albuterol albuterol has albuterol has me has me has me feeling feeling feeling all feeling all sorts sorts sorts of dizzy dizzy dizzy and weak weak weak this weak this morning morning morning
1. Tokenization 2. Score calculation
For each token “w”
b(w) = (number of positives containing w) (total number of positives)
g(w) = (number of negatives containing w)
(total number of negatives)
One Applied Example albuterol or salbutamolEstablished oral and inhaled drug for asthma & COPDHighly genericized marketA priori selected demonstration drug at demo.medwatcher.org
03
MethodsAlbuterol/salbutamol has been a demo drug from project inception due to client interest
English language onlyUse 0.7 indicator score thresholdMultiplicate copies consolidated
02. Proto-AEs
Tree plots, top events, PRRComparison to deduplicated UK spontaneous reportsDrug Analysis Prints form MHRA
04. Statistics
Need to check for false positives03. No curation
1 year ending October 23, 2015albuterol, salbutamol, brand names
01. Public Facebook and Twitter
Spam HappensExclude n=297 Twitter posts mentioning Ventolin, having a link, 19-Mar-2015 to 02-Apr-2015
Albuterol/Salbutamol: Proto-AE vs. Mentions
Mentions represent the patient voice.They exclude most spam, news, coupons,
and corporate announcements.
3% of Mentions were Proto-AEs
2.5%
3.1%3.2%
Unique posts from Twitter & Facebookn = 1,946 Proto-AEs
21 out of 26 System Organ Classes81% MedDRA SOCs
Posts can have multiple symptoms andproducts mentioned within same posts
2,987 Drug-Event Pairs
1,613 from Twitter & 333 from Facebook83% From Twitter
Higher level group terms82 HLGTs
Higher level terms125 HLTs
MedDRA v18199 Preferred Terms
24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=63,284 Mentions
Albuterol/Salbutamol by System Organ Class24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs Rectangle size = Proto-AE count | Color intensity = PRR
Cardiac disorders Ear and labyrinth disorders
Eye disorders
Gastrointestinal disorders
General disorders and administration site conditions
Immune system disorders
Infections and infestations
Injury, poisoning and procedural complications Investigations
Metabolism and nutrition disorders
Musculoskeletal and connective tissue disorders
Neoplasms benign, malignant and unspecified (incl cysts and polyps)
Nervous system disorders
Psychiatric disorders
Respiratory, thoracic and mediastinal disorders Skin and subcutaneous tissue disorders
Social circumstances
Surgical and medical procedures
Vascular disorders
Albuterol/Salbutamol by Preferred Term 24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs Rectangle size = Proto-AE count | Color intensity = PRR | Black = small sample PRR restriction
Palpitations
Drug ineffective
Pain Malaise
Incorrect dose administered
Hypersensitivity
Insomnia
Altered state of consciousness Feeling jittery
Tremor
Restlessness
Cough
Lung disorder
Anxiety Crying Somnolence
Wheezing
Dizziness
Nonspecific reaction
Fatigue Condition aggravated
Feeling abnormal
Heart rate increased
Drug dose omission
Bronchitis
Headache
Nausea Vomiting Psychomotor hyperactivity
Burning sensation
Affect lability Dependence
Pyrexia
Pneumonia
Irritability Muscle twitching
Albuterol/Salbutamol – Top 5 by Count24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs
Public Facebook and Twitter combined.
123
136
175
204
433
0 50 100 150 200 250 300 350 400 450 500
Tremor
Drug ineffective
Altered state of consciousness
Malaise
Cough
Represents the things that most concern patients
Albuterol/Salbutamol – Top 5 by PRR24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs
Unadjusted PRRs calculated against all other products in social listening database (n=2,551 medical products), 5 or more events
23.6
26.6
31.1
32.5
35.7
0 5 10 15 20 25 30 35 40
Tremor
Palpitations
Feeling jittery
Tachycardia
Wheezing
Corresponds well with product label
Correlation: Spontaneous Report vs. Social Media Non-parametric density (isometric plot) at HLGT level
Respiratory disorders NEC (cough, lung disorder, oropharyngeal pain, rhinorrhoea, chest pain, dyspnoea)
Movement disorders (tremor)
Therapeutic/non-therapeutic effects (drug ineffective)
General system disorders (malaise, feeling jittery, pain, condition aggravated,
fatigue, feeling abnormal, asthenia)
Sleep disorders (insomnia, somnolence)
Epidermal & dermal conditions (pruritus, skin discomfort, rash, angioedema, urticaria)
Spontaneous 07-Apr-1969 to 26-Aug-2015
46.5 years 2,199 unique non-fatal reports
n=3,851 reactions at HLGT 156 unique HLGT
Social 24-Oct-2014 to 23-Oct-2015 1 year 1,946 unique non-fatal reports N=2,419 reactions at HLGT 57 unique HLGT
Concordance More common in spontaneous data More common in social media data
Social Media in Practice Projects already underway for safety surveillance
04
Social Data in Concert with SpontaneousRectifying a false negative in an sponsor’s safety database
“Did this product cause x, y, z? I’ve had 3 administrations of this product andever since then have felt a, b, c.”
Twitter Post Identified
Assessment of causality and importance led the sponsor to initiate asubmission with possible label change
Submit to Regulators
Social media monitoring usingEpidemico platform
Routine Portfolio Surveillance
A couple of cases had been previouslyreported to the sponsor but
had not crossed the signal threshold.
Review Company’s Safety Database
Monitoring Discussions about Clinical TrialsProtecting validity and blinding
DiscontinuationI'm in a [Sponsor] double blind #### trial andcould receive the placebo up until week 16 at
which point I'm guaranteed to receive the drug. What I don't understand is that I can't be told if I w
as on the placebo for the first 16 weeks.I'm not seeing any positive benefit from whatever
I've been taking for 8 weeks so far, and I wouldreally like to know at week 16 if it's the drug
because then I can bail from the study at thattime and move on to something else…
Unblinding[My brother] started on Thursday. He told me yesterday that he tried to chew a capsule(I'm not sure why, I think he wanted to see if it had a flavor) & it was hard as a rock. I’massuming maybe it was eneric [sic] coated orsomething. He took the no flavor to meanhe's on the placebo, but I actually wonder ifthey would bother enteric coating a placebo.Probably doesn't matter & I know we're notsupposed to know anyway, just wondered ifanyone else's was enteric coated.
Beyond PharmacovigilanceHospital sentiment, counterfeits
Collaboration with WHO-UMC & MedDRA MSSO
Identify Products, Translate Vernacular to MedDRA
01
Reduce noise, provide indicator scoresRemove Spam, Identify AE , Benefits, Sentiment
02
Demographics, time to onset, dose, route,medical conditions, lab results, challenge
Entity Extraction for Causal Determination03
Names, hospital name, medical record #, phone, emailAnonymize by Removing PII
04
Coming 1Q2016: Free Public API In appreciation for the public investment in this tool (FDA, IMI & Web-RADR)
For social media, clinical trial patient diaries, case safety reports, scientific literature
Limitations What does “bias” mean in this setting?
05
How well can patients assess causality? How much should we care?Is a solely quantitative approach justified?
Causality & Quantitative Emphasis
01
Are we willing to disenfranchise the patient voice on privacy grounds?Patient Privacy vs. Muting the Patient Voice
02
How do you build tools for an ever-shifting data environment?Stability of Data Availability & Emerging Arenas
03
Can we all agree that nobody wants to see each Tweet as a separatecase report in a safety database? So, why is everyone so afraid?
Regulation Unclear
04
Challenges A few among many
Are social media data biased?YES! All data are biased. But, can we understand the bias and make use of the information?
What we know, how we know itEach source of product safety information provides different perspectives on patient experience
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Clinical Trials Sponsor AE Practice Data Social Media
PRO Seriousness Completeness
Social media can elucidate whatpatients experience most frequently.
1. Patient-Reported Outcomes
Traditional PV focus on most serious of eventsmay be less central in social media.
2. Seriousness
Social media posts may by less complete.Conversely, contain less sensitive identifying
information than electronic health records.
3. Completeness
Three Dimensions of Bias
Hypothetical data for illustrative purposes only.
Carrie Mike Aranka Ciara Nabarun Wenjie
Harold John Robin Chi Sue
Kelly Kate Clark Chris Robin
Derek
Matt
Essam
Melissa
Kyra
Nico
Lou
Carly
Lucas