21
LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Embed Size (px)

DESCRIPTION

LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading. LinkSphere. Linking Researchers and their Data Social networking for researchers Cross-database search Mostly Arts and Humanities datasets “Promoting serendipity” - PowerPoint PPT Presentation

Citation preview

Page 1: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

LinkSphere:P2P Cross Database Search --

Architecture and Issues

Hugo MillsUniversity of Reading

Page 2: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

LinkSphere

• Linking Researchers and their Data

• Social networking for researchers

• Cross-database search

– Mostly Arts and Humanities datasets

– “Promoting serendipity”

– Access by and presentation of datasets to wider audiences

Page 3: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Datasets

Museums Archives Archaeology:

Silchester Excavation, IADB

Ure Museum of Classical Archaeology

CentAUR: ePrints Library

Beckett Collection Cole Museum of

Zoology Film Collection Herbarium Typography

Collections

Page 4: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Tycho

Fully asynchronous peer-to-peer communications framework

Written in Java Fully distributed Robust

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” (Leslie Lamport)

Has a simple distributed data store (“Virtual Registry”) for client metadata

Page 5: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Tycho

(Relatively) lightweight 3MiB for a fully functional system

Fast• Flexible, Extensible

– Bootstrap handlers– Additional message types– VR extensions– Alternative communication protocols– Discovery of core mediators via Bonjour/ZeroConf

Page 6: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

XDB System Architecture

VR VRVRVR

Repo

Tycho Core

RepoRepoRepo

JDBC Web API SPARQL ...

REST search API

Search App Search App

Meta MetaMetaMeta

Page 7: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

User Interface

• Main UI is web-based

– Uses AJAX

– Currently embedded within the LinkSphere project site

– Will ultimately move to the SNS

• Any UI possible using the REST API

Page 8: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Issues

• Getting the data is hard

– Implementation problems

– Maintenance problems

– Admin problems

– Social problems

– Legal problems

Page 9: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

“Muddling along”

• Archive of material for intra-departmental use only

– Some legal issues involved

• Group of technicians administering the data

– Poor quality data

• Excel spreadsheet(!)

• Reluctant to have index of material made public

Page 10: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

“Not ready yet”

• Big university projects

• New systems, (potentially) large data sets

• MERL museums archive (AdLib)

– Data all loaded from previous systems

– Access modules not yet installed

• CentAUR publications archive (ePrints 3)

– Very little data available yet

Page 11: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

“Works For Me”

• Custom web application

– PHP, sophisticated

• External developer

• No documentation

• MySQL underneath

Page 12: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

“It works, but...” (part 1)

• Non-technical users

• Admins are Mac-only, desktop-only people

• FileMaker Pro

• DB structure and UI developed externally

– No documentation

– This has bad implications

Page 13: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

“It works, but...” (part 2)

• Completely custom application

– External developer

– No documentation (again)

– Large lump of write-only perl

• Custom data store

– Not SQL. Not XML. Not RDF.

• No external access

Page 14: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Unreachable data

• Uncommunicative systems

• Custom applications

– Developers/administrators AWOL

• Custom data models

• Lost passwords

• Excel spreadsheets

– See also, “Uncommunicative”

Page 15: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Unreachable data

• Private data

– Legal issues

– Possessive owners

• Internal use only

• Poor quality

• No data!

Page 16: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Conclusions

• Building the software is easy

• There is still lots of hard-to-reach data out there

• Issues are largely not technical

• More outreach to A&H areas needed

Page 17: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Acknowledgements and thanks

• LinkSphere team: Mark Baker, Shirley Williams, Pat Parslow (Reading), Claire Warwick, Melissa Terras, Claire Ross (UCL)

• Repository owners at Reading: Amy Smith (Ure Museum), Guy Baxter (University Archivist), Mary Dyson, Hadj Messelles (Typography), Jonathan Bignell (Film Studies), Alison Sutton (CentAUR), Mike Fulford, Amanda Clarke (Silchester)

• JISC VRE 3 programme

Page 18: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading
Page 19: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Tycho Architecture

VR

VR

VRM

M M

VR

M

C

C

C

C

CC

C

C

Page 20: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

REST Interface

• /api/query

– POST to start new query asynchronously

• /api/query/query_id

– GET for query metadata

– DELETE to cancel query (or it will time-out naturally)

• /api/query/query_id/start/finish

– GET a range of results from the query

• Feedback API coming soon

Page 21: LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

REST Interface

• /api/repository

– GET list of repositories currently online

• /api/repository/repo_id

– GET for repository metadata• Link to repository itself

• Link to LinkSphere description of it