Upload
zuwena
View
70
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Validation of chemical data on Wikipedia. Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project. Overview. Introduction Raising general quality in Wikipedia Validating chemical data in Wikipedia Recent developments in Wikipedia Chemistry The future? - PowerPoint PPT Presentation
Citation preview
Validation of chemical data on
Wikipedia
Martin A. WalkerDept. of Chemistry, SUNY Potsdam
Member of the Wikipedia Chemistry Project
Overview
• Introduction• Raising general quality in Wikipedia• Validating chemical data in Wikipedia• Recent developments in Wikipedia
Chemistry• The future?• Questions?
INTRODUCTIONWhat is Wikipedia – and what is it not?
Wikipedia is…• An encyclopedia
• A useful resource for chemistry
• Written by volunteers
• Editable by anyone• Free to be copied,
re-used• Free as in “no cost”
Wikipedia is not…• A database• A place to publish
original research• An authoritative
resource for chemistry
• Written mainly by kids, or by paid professionals
• Free to re-use without attribution
• Run by a corporation
Types of chemistry articleWIKIPROJECT CHEMISTRY• Chemical concepts• Chemical reactions & processes• ChemistsWIKIPROJECT ELEMENTS• Chemical elementsWIKIPROJECT CHEMICALS• Chemical substancesWIKIPROJECT PHARMACOLOGY• PharmaceuticalsWIKIPROJECT CELL & MOLECULAR
BIOLOGY• Molecular biology
WikiProject Chemistry
General chemistry content
Reactions & processes, concepts, chemists’ biographies, etc.
WikiProject Chemicals• ~60 members (~20 active)• Collaborates on writing
quality articles and standards for:– developing data boxes for
articles– chemical naming, structure
drawing– article assessment
• Data validation• Collaboration with CAS
Wim Van Dorst, a Dutch member of WP:Chem since March 2005.
Most articles have a Chembox
Chembox is designed to be machine readable and “database friendly”
WikiProject Pharmacology
Most articles have a Drugbox
Traffic can be very high….
Even for specialized topics
RAISING GENERAL QUALITY IN WIKIPEDIA
WMF: Long term strategyExpand the “virtuous circle”
Diagram by User:Randomran – Creative Commons license
Article assessment – by editors
Assessment guides article improvement priorities
Article ratings – by users
Pending changes (flagged revisions)“Articles under PC protection are open for editing, but
changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.”
WikiTrust• Downloadable as an extension to
Firefox, this adds a tab above the article:
VALIDATION OF WIKIPEDIA CHEMICAL DATA
How I use the key termsValidation =>“How I can be sure the data are correct?”
Curation = fixing errors
Content validation
• In 2008 a data validation drive was initiated for basic chemical identifiers
• Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct
• Now around 3500 substances have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN
• Other fields now being validated• Validated content indicated with
a check mark
CommonChemistry
• Launched in April 2009• Came about as a result of a
collaboration between CAS & Wikipedia
• Offered as a free service for CAS RNs for members of the public.
Organized by WP:Chemicals
• Moderate participation from members of WP:Pharmacology
The approach to validation• Every old version (called a RevID)
of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.
Protecting validated fieldsPROBLEM: This is “the encyclopedia anyone can edit” – so anyone can change the BP of water to 200 oC.
SOLUTION: A bot patrols the pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra (Eindhoven University of Technology). It is the only such tool on Wikipedia.
Validation protected by bot• If anyone tries to
vandalize a validated field, this will be flagged by a bot soon afterwards.– This example
received a red X 11 minutes after it was vandalized.
Validated revisionIDs
Checking structures• IN 2008-2010, around 3000
chemical structures were informally checked against CAS Common Chemistry
• PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be “invisibly” changed
Since fall 2010Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure imageA few hundred images now validated
Drugboxes
Drugboxes are patrolled by the bot, but at present WP:PHARM not active in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI).
THE FUTURE?
Validation of melting points
• Physical properties are much harder – require human validation
• Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs.
Supplementary data pages
Supplementary data pages can host MP validation sources
These pages have room to list all sources with linked refs – providing a “paper trail” to original sources
Other future developments• New formats for content –
books, for cellphones (Kiwix, Wikipock, Okawix)
• Offline versions that use quality checks and vandalism checks– for use in schools, developing countries, etc.
• More validated data fields, with “paper trails” and real-time checks
• Mashups with other sites• Integration with lab
instrumentation, lab notebooks, etc?
Acknowledgements• Antony Williams (RSC
ChemSpider)• Dirk Beetstra (Tech Univ
Eindhoven)• User:Physchim62 and many
other Wikipedians• JC Bradley and Andrew Lang
ANY QUESTIONS?Thank you for your attention