Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
C a i t l i n R i v e r sV i r g i n i a B i o i n f o r m a t i c s I n s t i t u t e
V i r g i n i a T e c h
Ethical use of Twitter for #DigDisDet
Motivation
• Willowbrook Hepatitis Study (1963-1966)
• Brooklyn Jewish Chronic Disease Cancer Study (1963)
• Stanford Prison Experiment (1971)
• Tuskegee Syphilis Study (1932-1972)
Motivation
• Laud Humphreys, sociologist (1960s)
• “To avoid bias, Humphreys secretly followed some men and recorded the license number on their vehicles. A year later, Humphreys showed up at their private homes and claimed to be a health service interviewer. He asked them questions about their marital status, race, job, and other personal questions”
-Historical Cases of Unethical Research
Serena Marsden & Melissa Melander
University of North Dakota
http://www.und.edu/instruct/wstevens/PROPOSALCLASS/MARSDEN&MELANDER2.htm
● ‘Microblogging’ social media service
● Connecting with people who share interests
● Default privacy is ‘open’
● ~500 million users
● ~340 million tweets sent daily around the world
Twitter in a nutshell
Twitter API
● Advanced programming interface (API)
● Most convenient API of the social networks
● Streaming API provides ~1% of tweets
● Search term and author-specific APIs also available
● API accounts freely available
Wikimedia Commons
Twitter API
● Data streamed include:
○ Tweet text
○ Username
○ Timestamp
○ Text location*
○ Geolocation*
○ Number of friends and followers
○ And more...
Twitter for research
● Population-level research for trends and patterns
● Syndromic surveillance (e.g. ILI), vaccine sentiments, disaster response, natural disaster surveillance etc.
● User-centric use case possible
○ Longitudinal study?
○ Contagion within social network?
○ Cascades
www.connectedaction.net
Existing guidelines
“Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.”
- Department of Health and Human Services Policy for Protection of Human Research Subjects (Section 46.101, section 4)
Existing guidelines
“Our Services are primarily designed to help you share information with the world. Most of the information you provide us is information you are asking us to make public. [...] Our default is almost always to make the information you provide public for as long as you do not delete it from Twitter [...]. Your public information is broadly and instantly disseminated.”-Twitter Privacy Policy; (Flesh Kincaid Grade Level 12)
Existing Norms
Wikimedia Commons
Differing expectations in public vs private spaces
Counting red shirts at the mall
Ok
Counting red shirts in homes
No
Following a red shirts around the mall to learn about purchasing behavior
No
Issues
What if…
● users don’t know or understand their data are available?
● identifiable data are used in a way that harms the user?
● natural privacy boundaries are violated?
● All would violate IRB but what about online spaces?
Chan Lowe, Sun Sentinal
Proposed DDD Norms
www.scienceprogress.org
Applied to DigDisDet
Data collected and analyzed in aggregate
Ok
Data collected from a specific user
No
Data collected from specific users from multiple sources
No
Proposed DDD Norms
1. Avoid publishing identifiable data.
• Tweet text• Author handle
2. Do not use data to procure more data from other sources.
• “Snowball” sampling using identifying info
• Surfing linked accounts
Ideas for DDD Norms
3. Be especially careful with geographic data.
• Protect coordinates as you would any other identifying data
4. Seek IRB approval for individual-based study designs.
• Likely requires consent• Following a user who
identifies as depressed
A Cautionary example
• Name
• Cell phone number
• Favorite music, TV, sports, hobbies
• Doesn’t like to read
• School he attends• Love life• Where he vacations• Bad habits• **His social network
Parting Motivation
• Laud Humphreys, sociologist (1960s)
• “To avoid bias, Humphreys secretly followed some men and recorded the license number on their vehicles. A year later, Humphreys showed up at their private homes and claimed to be a health service interviewer. He asked them questions about their marital status, race, job, and other personal questions”
-Historical Cases of Unethical Research
Serena Marsden & Melissa Melander
University of North Dakota
http://www.und.edu/instruct/wstevens/PROPOSALCLASS/MARSDEN&MELANDER2.htm
What else can we do?
Caitlin Rivers, MPH
Network Dynamics and Simulation Science Laboratory
Virginia Bioinformatics Institute
Virginia Tech