Upload
ulric-harrington
View
24
Download
3
Embed Size (px)
DESCRIPTION
LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading. LinkSphere. Linking Researchers and their Data Social networking for researchers Cross-database search Mostly Arts and Humanities datasets “Promoting serendipity” - PowerPoint PPT Presentation
Citation preview
LinkSphere:P2P Cross Database Search --
Architecture and Issues
Hugo MillsUniversity of Reading
LinkSphere
• Linking Researchers and their Data
• Social networking for researchers
• Cross-database search
– Mostly Arts and Humanities datasets
– “Promoting serendipity”
– Access by and presentation of datasets to wider audiences
Datasets
Museums Archives Archaeology:
Silchester Excavation, IADB
Ure Museum of Classical Archaeology
CentAUR: ePrints Library
Beckett Collection Cole Museum of
Zoology Film Collection Herbarium Typography
Collections
Tycho
Fully asynchronous peer-to-peer communications framework
Written in Java Fully distributed Robust
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” (Leslie Lamport)
Has a simple distributed data store (“Virtual Registry”) for client metadata
Tycho
(Relatively) lightweight 3MiB for a fully functional system
Fast• Flexible, Extensible
– Bootstrap handlers– Additional message types– VR extensions– Alternative communication protocols– Discovery of core mediators via Bonjour/ZeroConf
XDB System Architecture
VR VRVRVR
Repo
Tycho Core
RepoRepoRepo
JDBC Web API SPARQL ...
REST search API
Search App Search App
Meta MetaMetaMeta
User Interface
• Main UI is web-based
– Uses AJAX
– Currently embedded within the LinkSphere project site
– Will ultimately move to the SNS
• Any UI possible using the REST API
Issues
• Getting the data is hard
– Implementation problems
– Maintenance problems
– Admin problems
– Social problems
– Legal problems
“Muddling along”
• Archive of material for intra-departmental use only
– Some legal issues involved
• Group of technicians administering the data
– Poor quality data
• Excel spreadsheet(!)
• Reluctant to have index of material made public
“Not ready yet”
• Big university projects
• New systems, (potentially) large data sets
• MERL museums archive (AdLib)
– Data all loaded from previous systems
– Access modules not yet installed
• CentAUR publications archive (ePrints 3)
– Very little data available yet
“Works For Me”
• Custom web application
– PHP, sophisticated
• External developer
• No documentation
• MySQL underneath
“It works, but...” (part 1)
• Non-technical users
• Admins are Mac-only, desktop-only people
• FileMaker Pro
• DB structure and UI developed externally
– No documentation
– This has bad implications
“It works, but...” (part 2)
• Completely custom application
– External developer
– No documentation (again)
– Large lump of write-only perl
• Custom data store
– Not SQL. Not XML. Not RDF.
• No external access
Unreachable data
• Uncommunicative systems
• Custom applications
– Developers/administrators AWOL
• Custom data models
• Lost passwords
• Excel spreadsheets
– See also, “Uncommunicative”
Unreachable data
• Private data
– Legal issues
– Possessive owners
• Internal use only
• Poor quality
• No data!
Conclusions
• Building the software is easy
• There is still lots of hard-to-reach data out there
• Issues are largely not technical
• More outreach to A&H areas needed
Acknowledgements and thanks
• LinkSphere team: Mark Baker, Shirley Williams, Pat Parslow (Reading), Claire Warwick, Melissa Terras, Claire Ross (UCL)
• Repository owners at Reading: Amy Smith (Ure Museum), Guy Baxter (University Archivist), Mary Dyson, Hadj Messelles (Typography), Jonathan Bignell (Film Studies), Alison Sutton (CentAUR), Mike Fulford, Amanda Clarke (Silchester)
• JISC VRE 3 programme
Tycho Architecture
VR
VR
VRM
M M
VR
M
C
C
C
C
CC
C
C
REST Interface
• /api/query
– POST to start new query asynchronously
• /api/query/query_id
– GET for query metadata
– DELETE to cancel query (or it will time-out naturally)
• /api/query/query_id/start/finish
– GET a range of results from the query
• Feedback API coming soon
REST Interface
• /api/repository
– GET list of repositories currently online
• /api/repository/repo_id
– GET for repository metadata• Link to repository itself
• Link to LinkSphere description of it