0
10
20
30
40
50
60
70
consistency with
NCB
I gold stan
dard Consistency with NCBI standard, Development Corpus
mturk experiment 1, minimum 3 votes per annota9on
mturk experiment 2, minimum 3 votes per annota9on
NCBO annotator (Human Disease Ontology)
NCBI condi9onal random field trained on the AZ corpus (only "all" reported)
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses, such as gene set enrichment evaluations, that would otherwise be impossible. As such, there is a long and fruitful history of BioNLP projects that apply natural language processing to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are still vital to the process of knowledge extraction but are in short supply.
Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. In addition, several recent volunteer-based citizen science projects have demonstrated the public’s strong desire and ability to participate in the scientific process even without any financial incentives. Based on these observations, the mark2cure initiative is developing a Web interface for engaging large groups of people in the process of manual literature annotation. The system will support both microtask workers and volunteers. These workers will be directed by scientific leaders from the community to help accomplish ‘quests’ associated with specific knowledge extraction problems. In particular, we are working with patient advocacy groups such as the Chordoma Foundation to identify motivated volunteers and to develop focused knowledge extraction challenges. We are currently evaluating the first prototype of the annotation interface using the AMT platform.
ABSTRACT
Benjamin M Good, Max Nanis, Andrew I Su
The Scripps Research Institute, La Jolla, California, USA
REFERENCES
We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924).
CONTACT Benjamin Good: [email protected] Andrew Su: [email protected]
1. Zhai, Haijun, et al. "Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing." Journal of medical Internet research 15.4 (2013).
2. Doğan, Rezarta Islamaj, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.
ABSTRACT
FUNDING
Challenge Next Steps
RESULTS, Comparison to concept recogniFon tools Proof of Concept Experiment with AMT (work in progress)
Consistency(A,B) = 2*100*(N shared annota9ons)
(N(A) + N(B))
Can non-‐experts annotate disease occurrences in text beRer than machines?
To what degree can we reproduce the NCBI disease corpus [2]?
Objec9ves for Annotators Highlight all diseases and disease abbreviaFons “...are associated with Hun9ngton disease ( HD )... HD pa9ents received...” “The WiskoR-‐Aldrich syndrome ( WAS ) …” Highlight the longest span of text specific to a disease “... contains the insulin-‐dependent diabetes mellitus locus …” and not just ‘diabetes’. “...was ini9ally detected in four of 33 colorectal cancer families…” Highlight disease conjuncFons as single, long spans. “...the life expectancy of Duchenne and Becker muscular dystrophy pa9ents..” “... a significant frac9on of familial breast and ovarian cancer , but undergoes…” Highlight symptoms -‐ physical results of having a disease “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Pa9ents ofen display learning disabili9es, hearing loss, and visual impairment. Highlight all occurrences of disease terms “Women who carry a muta9on in the BRCA1 gene have an 80 % risk of breast cancer by the age of 70. Individuals who have rare alleles of the VNTR also have an increased risk of breast cancer ( 2-‐4 )”.
• 6900 disease men9ons in 793 PubMed abstracts • developed by a team of 12 annotators • covers all sentences in a PubMed abstract • Disease men9ons are categorized into Specific Disease,
Disease Class, Composite Men9on and Modifier categories.
Goal: structure all knowledge published as text on the same day it appears in PubMed with expert-‐human level precision and recall
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
Number arFcles added to PubMed
Approach: CiFzen Science
Idea: People are very effec9ve processors of text, even in areas where they aren’t experts [1]. Numerous experiments have shown the public’s desire to contribute to science. Lets give them an opportunity to help annotate the biomedical literature.
Use the AMT to test the concept before aRemp9ng to mo9vate a ci9zen science movement
Tes9ng on the 100 abstract “development set”, 5 workers per abstract, $.06 per completed abstract
RESULTS, 2 experiments
AMT workers performed beRer than condi9onal random field trained on the AZ corpus.
Examples
• Con9nued refinement of the annota9on interface with AMT
• Experiment to compare AMT results versus volunteers
• Collabora9ons with disease groups such as the Chordoma Founda9on to prime the flow of ci9zen scien9st annotators
We are hiring! Looking for postdocs, programmers interested in crowdsourcing and bioinforma9cs contact [email protected]
Number of votes per annota9on
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
precision
recall
F
Experiment 1
• one week each, ($30) • one month turk-‐specific
developer 9me...
Costs
• Expanded instruc9ons with more examples • Minor interface changes (selec9ng one
term automa9cally selects all other occurrences)
Worker instruc9ons
Exp. 2 changes
Nearly iden9cal results
Exp. 1 results