View
10
Download
1
Category
Tags:
Preview:
Citation preview
Creating structured biomedical knowledge networks via crowdsourcingTong Shu LiSu Lab, The Scripps Research InstituteBio-Ontologies SIG, ISMB 20152015-07-10
Knowledge networks allow for result interpretation
Bainbridge 2011
Network creation process
Relationship extraction subproblems
Crowdsourcing introduction
• Members of the public perform small tasks for small amounts of money• Tasks are usually difficult for
computers• Workers contribute as a way of
earning supplemental income• Useful source of labor for
academics and companies
Crowdsourcing driven biocuration
• Goal: replicate work done by PhD biocurators with members of the crowd• Advantages:• Scalability• Faster results at a lower cost• Well suited for non-automatable
tasks where an expert is not necessary
Crowdsourcing relies on gold standards for validation• Crowdsourcing methods need to be validated with gold standards• Gold standard: EU-ADR corpus [1]• “Positive”: known relationship• “Speculative”: uncertain relationship• “Negative”: known lack of relationship• “False”: no claim of relationship
• Sentence-bound relationships• 300 Abstracts annotated with relationships between
genes/diseases/drugs
[1] van Mulligan et al. (2012) J. Biomed Inform. 45: 879
Platform interface for relation annotation
Crowd agreement with the EU-ADR
• Strict agreement with EU-ADR: 71.67% (43/60 sentences)• Agreement after combining
speculative and positive: 76.67%
• 10 judgements/sentence• 10 cents/judgement• Time to complete: 2 hours• Total cost: $182.21 USD
Variability of gold standards
Number of experts who chose that relationship type
Percent of raw EU-ADR relations
Crowd agreement as a proxy for clarity
Percent of crowd which chose published EU-ADR answer
Crowd agreement and accuracy probability
Percent crowd agreement for the top choice
Percent of annotations which agreed with EU-ADR
Abstract level relationship extraction
Preliminary results
• AUC of 0.904• Max F-score of 0.791 (0.773
precision, 0.809 recall)• Max F-score achieved at a voting
score of 0.407• 4.5 hours, $54.72 USD to
annotate 30 abstracts
Conclusion and next steps
• Gold standards are variable and imperfect• Binary agreement may hide
interesting information• Expert and crowd agreement can
be used to measure gold standard consistency
• Ambiguous portions of a gold standard may need to be treated differently during evaluations• Integration with machine
learning methods• Data generation• Feature extraction
• Semantically typed relationships
Acknowledgements
• Dr. Andrew Su• Dr. Benjamin Good• Dr. Laura Furlong• Dr. Zhiyong Lu• The Su Lab• We’re hiring!
EU-ADR relationship examples• Positive
• For exposure levels within standard recommended guidelines, radioisotopes are far more likely to play a role in the occurrence of spontaneous abortions than X-rays.
• Speculative• Information from the SITE Cohort
Study should clarify whether use of these immunosuppressive drugs for ocular inflammation increases the risk of mortality and fatal cancer.
• Negative• We found no evidence of impaired
control of the carbohydrate and lipid metabolism or aggravation of vascular lesions during the two years an etonogestrel implant was used by diabetic women.
• False• The frequency of PONV did not
correlate to the amounts of alfentanil, propofol, postoperative antiemetics consumed, or to female gender, non-smoking status, and history of PONV or motion sickness.
Data for all 244 drug-disease sentences
Crowd agreement and accuracy probability
Percent of annotations which agreed with EU-ADR
Percent crowd agreement for the top choice
Recommended