24
Building a Web Application to Monitor PubMed Retraction Notices Neil Saunders CSIRO Mathematics, Informatics and Statistics Building E6B, Macquarie University Campus North Ryde December 1, 2011

Building A Web Application To Monitor PubMed Retraction Notices

Embed Size (px)

DESCRIPTION

Monitoring PubMed retraction notices using Ruby, MongoDB, Sinatra and Heroku. Talk given to internal CSIRO Bioinformatics User Group, December 1 2011.

Citation preview

Page 1: Building A Web Application To Monitor PubMed Retraction Notices

Building a Web Application to Monitor PubMedRetraction Notices

Neil Saunders

CSIRO Mathematics, Informatics and StatisticsBuilding E6B, Macquarie University Campus

North Ryde

December 1, 2011

Page 2: Building A Web Application To Monitor PubMed Retraction Notices

Retraction Watch

Page 3: Building A Web Application To Monitor PubMed Retraction Notices

Project Aims

Monitor PubMed for retractions

Retrieve retraction data and store locally for analysis

Develop web application to display retraction data

Page 4: Building A Web Application To Monitor PubMed Retraction Notices

PubMed - advanced search, RSS and send-to-file

Page 5: Building A Web Application To Monitor PubMed Retraction Notices

Updates in Google Reader

Page 6: Building A Web Application To Monitor PubMed Retraction Notices

PubMed - MeSH

Page 7: Building A Web Application To Monitor PubMed Retraction Notices

PubMed - EUtils

http://www.ncbi.nlm.nih.gov/books/NBK25501/

Page 8: Building A Web Application To Monitor PubMed Retraction Notices

EInfo example script

#!/usr/bin/rubyrequire ’rubygems’require ’bio’require ’hpricot’require ’open-uri’

Bio::NCBI.default_email = "[email protected]"ncbi = Bio::NCBI::REST.newurl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db="ncbi.einfo.each do |db|

puts "Processing #{db}..."File.open("#{db}.txt", "w") do |f|

doc = Hpricot(open("#{url + db}"))(doc/’//fieldlist/field’).each do |field|

name = (field/’/name’).inner_htmlfullname = (field/’/fullname’).inner_htmldescription = (field/’description’).inner_htmlf.write("#{name},#{fullname},#{description}\n")

endend

end

Page 9: Building A Web Application To Monitor PubMed Retraction Notices

EInfo script - output

ALL,All Fields,All terms from all searchable fieldsUID,UID,Unique number assigned to publicationFILT,Filter,Limits the recordsTITL,Title,Words in title of publicationWORD,Text Word,Free text associated with publicationMESH,MeSH Terms,Medical Subject Headings assigned to publicationMAJR,MeSH Major Topic,MeSH terms of major importance to publicationAUTH,Author,Author(s) of publicationJOUR,Journal,Journal abbreviation of publicationAFFL,Affiliation,Author’s institutional affiliation and address...

Page 10: Building A Web Application To Monitor PubMed Retraction Notices

MongoDB Overview

MongoDB is a so-called “NoSQL” databaseKey features:

Document-oriented

Schema-free

Documents stored in collections

http://www.mongodb.org/

Page 11: Building A Web Application To Monitor PubMed Retraction Notices

Saving to a database collection: ecount

#!/usr/bin/ruby

require "rubygems"require "bio"require "mongo"

db = Mongo::Connection.new.db(’pubmed’)col = db.collection(’ecount’)Bio::NCBI.default_email = "[email protected]"ncbi = Bio::NCBI::REST.new

1977.upto(Time.now.year) do |year|all = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})term = ncbi.esearch_count("Retraction of Publication[ptyp] #{year}[dp]",

{"db" => "pubmed"})record = {"_id" => year, "year" => year, "total" => all,

"retracted" => term, "updated_at" => Time.now}col.save(record)puts "#{year}..."

end

puts "Saved #{col.count} records."

Page 12: Building A Web Application To Monitor PubMed Retraction Notices

ecount collection

> db.ecount.findOne(){

"_id" : 1977,"retracted" : 3,"updated_at" : ISODate("2011-11-15T03:58:10.729Z"),"total" : 260517,"year" : 1977

}

Page 13: Building A Web Application To Monitor PubMed Retraction Notices

Saving to a database collection: entries

#!/usr/bin/ruby

require "rubygems"require "mongo"require "crack"

db = Mongo::Connection.new.db("pubmed")col = db.collection(’entries’)col.drop

xmlfile = "#{ENV[’HOME’]}/Dropbox/projects/pubmed/retractions/data/retract.xml"xml = Crack::XML.parse(File.read(xmlfile))

xml[’PubmedArticleSet’][’PubmedArticle’].each do |article|article[’_id’] = article[’MedlineCitation’][’PMID’]col.save(article)

end

puts "Saved #{col.count} articles."

Page 14: Building A Web Application To Monitor PubMed Retraction Notices

entries collection

{"_id" : "22106469","PubmedData" : {

"PublicationStatus" : "ppublish","ArticleIdList" : {

"ArticleId" : "22106469"},"History" : {

"PubMedPubDate" : [{

"Minute" : "0","Month" : "11","PubStatus" : "entrez","Day" : "23","Hour" : "6","Year" : "2011"

},{

"Minute" : "0","Month" : "11","PubStatus" : "pubmed","Day" : "23","Hour" : "6","Year" : "2011"

},...

Page 15: Building A Web Application To Monitor PubMed Retraction Notices

Saving to a database collection: timeline

#!/usr/bin/ruby

require "rubygems"require "mongo"require "date"

db = Mongo::Connection.new.db(’pubmed’)entries = db.collection(’entries’)timeline = db.collection(’timeline’)

dates = entries.find.map { |entry| entry[’MedlineCitation’][’DateCreated’] }dates.map! { |d| Date.parse("#{d[’Year’]}-#{d[’Month’]}-#{d[’Day’]}") }dates.sort!data = (dates.first..dates.last).inject(Hash.new(0)) { |h, date| h[date] = 0; h }dates.each { |date| data[date] += 1}data = data.sortdata.map! {|e| ["Date.UTC(#{e[0].year},#{e[0].month - 1},#{e[0].day})", e[1]] }

data.each do |date|timeline.save({"_id" => date[0].gsub(".", "_"), "date" => date[0], "count" => date[1]})

end

puts "Saved #{timeline.count} dates in timeline."

Page 16: Building A Web Application To Monitor PubMed Retraction Notices

timeline collection

> db.timeline.findOne(){

"_id" : "Date_UTC(1977,7,12)","date" : "Date.UTC(1977,7,12)","count" : 1

}

Page 17: Building A Web Application To Monitor PubMed Retraction Notices

Sinatra: minimal example

require "rubygems"require "sinatra"

get "/" do"Hello World"

end

# ruby myapp.rb# http://localhost:4567

Page 18: Building A Web Application To Monitor PubMed Retraction Notices

Highcharts: minimal example code

var chart = new Highcharts.Chart({chart: {

renderTo: ’container’,defaultSeriesType: ’line’

},xAxis: {

categories: [’Jan’, ’Feb’, ’Mar’, ’Apr’, ’May’, ’Jun’,’Jul’, ’Aug’, ’Sep’, ’Oct’, ’Nov’, ’Dec’]

},series: [{

data: [29.9, 71.5, 106.4, 129.2, 144.0, 176.0,135.6, 148.5, 216.4, 194.1, 95.6, 54.4]

}]});

// <div id="container" style="height: 400px"></div>

Page 19: Building A Web Application To Monitor PubMed Retraction Notices

Highcharts: minimal example result

Page 20: Building A Web Application To Monitor PubMed Retraction Notices

Web Application Overview

|---config.ru|---Gemfile|---main.rb|---public| |---javascripts| | |---dark-blue.js| | |---dark-green.js| | |---exporting.js| | |---gray.js| | |---grid.js| | |---highcharts.js| | |---jquery-1.4.2.min.js| |---stylesheets| |---main.css|---Rakefile|---statistics.rb|---views

|---about.haml|---byyear.haml|---date.haml|---error.haml|---index.haml|---journal.haml|---journals.haml|---layout.haml|---test.haml|---total.haml

Page 21: Building A Web Application To Monitor PubMed Retraction Notices

Sinatra Application Code - main.rb

# main.rbconfigure do

# a bunch of config stuff goes here# DB = connection to MongoDB database# timelinetimeline = DB.collection(’timeline’)set :data, timeline.find.to_a.map { |e| [e[’date’], e[’count’]] }

end

# viewsget "/" do

haml :indexend

Page 22: Building A Web Application To Monitor PubMed Retraction Notices

Sinatra Views - index.haml

%h3 PubMed Retraction Notices - Timeline%p Last update: #{options.updated_at}

%div#container(style="margin-left: auto; margin-right: auto; width: 800px;")

:javascript$(function () {

new Highcharts.Chart({chart: {

renderTo: ’container’,defaultSeriesType: ’area’,width: 800,height: 600,zoomType: ’x’,marginTop: 80

},legend: { enabled: false },title: { text: ’Retractions by date’ },xAxis: { type: ’datetime’},yAxis: { title:

{ text: ’Retractions’ }},

series: [{data: #{options.data.inspect.gsub(/"/,"")}

}],// more stuff goes here...});

});

Page 23: Building A Web Application To Monitor PubMed Retraction Notices

Deployment: Heroku + MongoHQ

Heroku.com - free application hosting (for small apps)

Almost as simple as:

$ git remote add heroku [email protected]:appname.git

$ git push heroku master

MongoHQ.com - free MongoDB database hosting (up to 16 MB)

Page 24: Building A Web Application To Monitor PubMed Retraction Notices

“Final” product

Application - http://pmretract.heroku.com

Code - http://github.com/neilfws/PubMed