Upload
geoffrey-richardson
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
CERN: European Organization for Nuclear Research (since
1954)
CERN: European Organization for Nuclear Research (since
1954)• World leading HEP laboratory, Geneva (CH)• 2500 staff (mostly engineers,administrators/services)
• 9000 users (physicists from 580 institutes in 85 countries)
• 3 Nobel prizes (Accelerators, Detectors, Discoveries)
• Invented the web• Ready to re-start the 27-km (6bn€) LHC
accelerator, “the big-bang machine”• Top management committed to Open Access• Runs a 1-million objects Digital Library
CERN Convention (1953): ante-litteram Open Access manifesto“… the results of its experimental and theoretical work shall be published or otherwise made generally available”
INSPIRE team @ CERNBeing Recruited (IT)– 100% (API, grid-ification)Jukka Klem (OA) – 80% (Applications)Jean-Yves le Meur (IT) – Infra supervisionTibor Šimko (IT) – Tech supervisionTim Smith (IT) – Infra strategy & MGASalvatore Mele (OA) – Apps strategy & TBTBC: Junior developer (OA/IT) – (Interface applications/API)
Who is INSPIRE ?
Fermilab CERN
DESY
SLAC
arXivADS
Who are our buddies ?
APS SISSA
ElsevierSpringer
Which publishers do we talk to ?
PDG
Durham
KEK
World Scientific
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
The HEP “preprint culture”L.Goldschmidt-Clermont, 1965,
http://eprints.rclis.org/archive/00000445/02/communication_patterns.pdf• Scientific journals of ‘60s too slow for HEP• Mass-mail preprints to institutes worldwide• Ante litteram (institute-pays) Open Access• CERN library starts index and display preprints• Leading research libraries “serve” preprints
CERN Library, circa 1960
Before e-mail and RSS...L. Addis, 2002, http://www.slac.stanford.edu/spires/papers/history.html
• SLAC Library (Stanford) maintains preprint lists• Sending lists to subscribers worldwide as of ‘62• Scientists then request preprints of interest• Published articles go on anti-preprint list• Indispensable working tool from ‘60s to ‘80s
SPIRES: first electronic catalogue
http://www.slac.stanford.edu/spires/papers/history.htmlhttp://www-conf.slac.stanford.edu/interlab99/program/kunz/EarlyWeb.frame.pdf
• SLAC Library,1974: now 750’000 records• With Fermilab (US) and DESY (DE) Libraries • Electronic catalogue of preprints metadata• Updated with publication reference• First terminal login, then e-mail interface• Then the first web server in U.S.
Date: Fri, 13 Dec 91 17:55:53 GMT+0100From: [email protected] (Tim Berners-Lee)Subject: WWW to SPIRES on SLACVM - ExperimentalTo: [email protected], [email protected]
There is an experimental W3 server for the SPIRES High energy Physics preprint database, thanks to Terry Hung, Paul Kunz and Louise Addis of SLAC. It's only just been put up, so don't expect perfection. With the w3 line mode browser, follow a link to it from our home page,
- Tim
Paul Kunz wrote a few days ago:-
"The SLAC Library maintainer of SPIRES databases, Louise Addis, is absolutely delighted. She will ask for a permanent VM service machine and finish off the polishing. Things are really moving now.”
arXiv.org the archetypal repository
• P. Ginsparg, LANL, 1991. Now Cornell Library
• E-mail based, then immediately on the web• No mandate, no debate, author-driven• 1/2 Million preprints. Growing beyond HEP
http://vmsstreamer1.fnal.gov/VMS_Site_03/Lectures/Colloquium/presentations/090506Ginsparg.pdf
Where do HEP scientists go for info?
• Survey of 2’000+ scientists (10% community)
• Library/community answers to info needs
• Google as proxy of arXiv, SPIRES, publishers
Gentil-Beccot et al. arxiv:0804.2701
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
What more do users want ?Gentil-Beccot et al. arxiv:0804.2701
Not importantVery important
Dep
th o
f
cove
rage
Qua
lity
of
cont
ent
Acces
s to
full
text
Where do users see the systems go ?
Gentil-Beccot et al. arxiv:0804.2701
• Seamless Open Access to pre-’90s articles• “Greyer” literature (laboratory reports)• Conference slides (linked with articles)• “Publication” of “ancillary” material:
– Data behind tables, figures– Re-usable experimental data
• Some sort of peer-review overlaid on arXiv• “Smarter” search tools
What would users give ?Gentil-Beccot et al. arxiv:0804.2701
• Would users contribute to tag articles ?• Indexing and keywording in a Web2.0
world !• Immense potential to be harnessed
Would contribute 30 minutes/week or more
Would not contribute
Fract
ion
of
an
swers
Seniority in the field
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
Building INSPIREhttp://www.projecthepinspire.net/
• Joint project of CERN, DESY, FERMILAB, SLAC• Switch off aging SPIRES infrastructure• Import 750’000+ records into an Invenio
instance• Inherit 50’000+ users (60+ million
searches/year)• Roll out 1Q10 (working on back-offices tools)• Out of the box: totally new back-office, • Bi-directional feeds with arXiv and publishers
Releasing INSPIREhttp://www.projecthepinspire.net/
Medium term add-ons to INSPIRE (2Q10-4Q10)• Full-text searching warehouse, Open Access &
Copyrighted• Author disambiguation (algorithm & web2.0)• Personal shelves, with annotations. Alerts• Drop-box for old preprints, theses … (advocacy
campaign)• Widespread “drop”, describe and search non-text
material• User generated tags (taxonomic & à la Flickr) • Thesaurus-based semantics, then folksonomy & ontology
Who is INSPIRE?Where does INSPIRE come from?
How does HEP communicate?What do scientists want?What is Invenio?
Where does INSPIRE go?How do we go there together?
Use computational power of e-Infrastructure to grow repository
services
1.Back-office infrastructural services2.Back-office content-analysis services3.Novel front-line services
1. Back-office infrastructural services
I. Parallelization of full-text indexingII. OCR’ing old holdings/new scanned
submissionsIII. “Gorilla” classification of contentIV.Text-mining for metadata and citation
extraction
2. Back-office content-analysis servicesClustering of “similar” records for
I. Discovery (if you want this you might want that)
II. Ranking (first result is what you want)
Nightly re-clustering holdings including daily updates:
1. User-generated tags2. New additions with their
metadata/citations/logs
Use citations, author network, tags, logs
3. Novel front-line services
Reqs: Impossible without a Grid, but latency tolerant
“Find me a mentor”User uploads A4-size research synopsisINSPIRE identifies appropriate mentor (or
referee)Depends on success of parallel semantic
project