View
55
Download
0
Category
Preview:
DESCRIPTION
Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002. Edward A. Fox (with Hussein Suleman, Ming Luo) fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … - PowerPoint PPT Presentation
Citation preview
Building Digital Libraries Made Easy:Toward Open Digital Libraries
ICADL 2002 – Singapore – Dec. 2002
Edward A. Fox(with Hussein Suleman, Ming Luo)
fox@vt.edu http://fox.cs.vt.eduCS DLRL Internet TICNDLTD CITIDEL NSDL …Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)
• Sponsors: ACM, Adobe, DLF, IBM, Mellon Foundation, Microsoft, NSF (Grants CDA-9312611; DUE-0121741, 0136690, 0121679; IIS-0080748, 0086227, 0002935, and 9986089), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, …
• Faculty/Staff (now): Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee Giles, Martin Halbert, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Muhammad Zubair, …
• Students: Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan Richardson, Priya Shivakumar, Wensi Xi, Liang Xu, Baoping Zhang, …
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Overview
We• address the problem of how to develop DLs;• build on experience in building many DLs;• strive for simplicity as per OCKHAM initiative;• build upon the Open Archives Initiative;• demonstrate our approach in diverse situations;• and invite all to
• use DL-in-a-box and• help build Open Digital Libraries.
Problem
Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are:
1. The library budget won’t allow purchase of a commercial DL system.
2. Unless the development effort is local, there won’t be any control.
3. DLs are extensions of DBMSs, so they are simple applications to develop.
4. Since DLs operate on the Web, one must adopt the newest W3C proposal.
Problem – cont’d
5. Since technology moves so quickly, it is essential to follow the latest fad.
6. CS students always develop from scratch.
7. This team knows it can do it better.
8. This system must have more capabilities than any other system.
9. This DL has to be more flexible and extensible.
10. This is the right system architecture – at last!
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Experience: Case Study Projects
• AmericanSouth.org
• NDLTD
• CSTC
• JERIC
• CITIDEL
• NSDL
• Digital Library in a Box
AmericanSouth.org
• Domain: culture and history of the southern region of America (USA)
• Genre: diverse distributed collections at a dozen universities
• Submission & Collection: local sites Emory University (for SOLINET)
Networked Digital Library of Theses and Dissertations (NDLTD)
• Domain: graduate education and research
• Genre: electronic theses and dissertations (ETDs)
• Submission & Collection: local sites www.ndltd.org, www.theses.org
Computer Science Teaching Center (CSTC)
• Domain: teaching computer science
• Genre: courseware
• Submission & Collection: www.cstc.org
CS Teaching Center (CSTC): Lessons Learned
• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules that have been reviewed and tested.
• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
Browsing (2)
ACM Journal of Educational Resources in Computing (JERIC)
• Domain: teaching computer science
• Genre: courseware, scholarly articles
• Submission & Collection: CSTC, ACM Digital Library
JERIC
• JJournal of EEducational RResources iin CComputing
• Accessible from www.cstc.org and www.acm.org and www.citidel.org
• ACM and SIGCSE support
• Refereed and interactive
• Part of ACM Digital Library
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, technical reports, …
• Submission & Collection: sub/partner collections www.citidel.org
CITIDEL Team
• An NSDL Collection Track project
• Led by Virginia Tech, with co-PIs:• Fox (director, DL systems)• Lee (history)• Perez (user interface, Spanish support)
• Partners• College of New Jersey (Knox)• Hofstra (Impagliazzo)• Villanova (Cassel)• Penn State (Giles)
Summary of Spring 2001 Survey of CITIDEL-related Collections
and their Sizes
Size of Collection
1-5 items
6-100 items
101-999items
+1000items
Number ofCollectionsIdentified
100-300 50 20-35 10-25
English
Spanish
Nominated
Editor reviewed
Java
Multimedia
LLaanngguuaaggee TTooppiicc
QQuuaalliittyy
Identified by crawl
Peer reviewed
Algorithms
Multi-dimensional Categorization
CITIDEL Collection Sources
metadata
JERIC
fulltext
Experts’finding
aids
IEEE-CS…
include
CSTC ResearchIndex
ACM
NEC’sdata
dataprocessedw. R.I.
SIGCSEproceedings
ACMDL
include
include
include
include
include
Borner’sinfo vizsoftware
repository
NCSTRL
CITIDEL Collection Buildingthru
aided by
after
using
or thru
using
Submitting
VIADUCTGetSmart
Searching,Browsing
Classifying
Nominating
Crawling
Crawlifier
thru
Composing
include afterCreating
include after
DIGITAL LIBRARY SERVICES
REPOSITORIES
USER PORTALS
Overview of CITIDEL architecture
Union Metadata Repository
OAI Data
Provider
Laboratories Repository
Applets Repository
Papers Repository
Syllabi Repository
. . .
Digital Library Services
OAI Data
Harvester
Distributed repository structure
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
National Science Digital Library (NSDL)
• Domain: undergraduate and K-12 education, etc.
• Genre: educational resources
• Submission & Collection: sites of 90 projects www.nsdl.org
NSDL Information ArchitectureDeveloped by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
Digital Library in a Box
• Domain: helping DL projects
• Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA)
• Software and Documentation: http://dlbox.nudl.org
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Open Archives Initiative
OAIwww.openarchives.org
openarchives@openarchives.org
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
Technical Umbrella for Practical Interoperability…
ReferenceLibraries
PublishersE-Print
Archives
…that can be exploited by different communities
Museums
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
Browse SummarizeSearch Visualize
DO DODODODODODO
Services:
Docs:
Metadata:
Aggregation throughOAI Harvesting
Archive
Lite Sites
NCSTRL
Eprints
IEEE-CS, ACM, …
Own: History, ResearchIndex,
CSTC, …
CITIDEL
Active
Protocol for Metadata Harvesting
• Service Requests• Identify
• ListMetadataFormats
• ListSets
• GetRecord
• ListIdentifiers
• ListRecords
• Metadata Multiplicity
• Date/Time Ranges
• Sets (with semantics depending on local data providers)
• Resumption Tokens
NDLTD OAI Example
NDLTD Site / Member
Local DB
OAI Server
Local Search / Brow se
Student Entry
NDLTD Central
OAI Harvester
Name Authority Service
(e.g. OCLC)
MARIAN Union
Catalog
VTLS Union Catalog
MARC DB
Virtua
Conversion
Alternate MARC Transport (f tp?) tapes?)
Librarian Verif ication / Validation / Enrichment / Maintenance
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Open Digital Library (ODL) Hypothesis (Hussein Suleman)
• Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ?
Maybe … if
Digital Libraries can be modeled as• networks of extended Open Archives, where• each extended Open Archive is a• source of data and/or a provider of services.
Example Architecture (NDLTD)
Humboldt
Duisburg
MIT Filter
MIT
Browse
Union Catalog
Search Recent
User Interface
User Interface
OAI/ODL archive
OAI/ODL protocol
leg
end
Virginia Tech
PhysNet
CalTech
Dresden
ODL Demonstration - FrontPage
ODL Demonstration - Search
ODL Demonstration - Browse
Hussein Suleman’s Thesis Summary
• Open Digital Libraries (DLs)
• Open Archives Initiative (OAI)
• Protocol for Metadata Harvesting (PMH)
• Extending OAI-PMH provides the glue for building componentized DLs.
• Lightweight protocols connect the components to support modular systems with good efficiency.
Research in a Nutshell
• We build extensible modular systems with customizable services.
• This supports interoperability and allows distributed development.
• This is in use in www.cstc.org, AmericanSouth.org, www.citidel.org, …
• Components include search, browse, annotate, editorial support, union, filter, whats-new, submit, rate, recommend, …
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
users digital objects
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
componentized digital library
?
?
?
?
???
?
?
?
?
??
? ?
?
?
?
?
?
?
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
ODL Component Requirements
• Search• Retrieve a list of items• Index new items
• Annotate• Add annotation to item• Retrieve a list of annotations for an item
Open Digital Library Components
• Running now• XML-File (data provider from file system)• Union, search, browse, recent, filter• E-journal/review, Submit, Edit, Annotation
• Class projects• High performance multilingual search• Recommender, Rating; Mirroring (see JCDL’02)• Working with NCSA: from DB, unstructured text
• Others discussed• Classification/categorization• DL-Viz interconnection (VIDI – Jun Wang ETD)
Harvest from data providers
DBUnion Archive Merger Component
DBBrowse Browse Engine
IRDB-1 Search Engine
As Metadata Search Service Provider
As Metadata Browse Service Provider
XML File Coll. & Data Provider 1
XML File Coll. & Data Provider 2
XML File Coll. & Data Provider 3
Open Digital Library: Extended
What’s NewEngine
As What’s New Service Provider
OAI-PMHData Provider
Submit Archive
OAIB (NCSA:from RDBMS)
Filter
Recommend
RateEngine
AnnotationEngine
IRDB-2 Search Engine
As Annotation Search Service
Provider
As Recommend & Rate Service Provider
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
Digital Library for the Networked Digital Libraryof Theses and Dissertations (www.ndltd.org)
SearchFilter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers
ETD collections
Example Open Digital Library
DBReview Box: Reviews
USER INTERFACE
Box: Resources
under Review
DBUnion: Metadata
Union
User Interface OAI/ODL component OAI/ODL protocol
Box: Accepted
Resources
IRDB
Box: Users
DBUnion: Legacy
Metadata
Thread
DBRate
Suggest
DBBrowse
Example Open Digital Library
Digital Library for theComputer Science Teaching Center (www.cstc.org)
CSTC User Interface
Open Digital Library Component
Extended OPEN ARCHIVE
OPENARCHIVE
Layer 1 : OAI PMH
• Protocol for Metadata Harvesting• Transfer stream of metadata from one archive
or component to another
• Service Requests• Identify, ListSets, ListMetadataFormats• GetRecord, ListIdentifiers, ListRecords
Layer 2 : Extended OAI-PMH
• OAI-PMH + extensions for general-purpose inter-component communication• Added in generic containers in every response
for additional information• Added “PutRecord” to submit a record• Increased granularity to support times as well
as dates (same as OAI-PMH v2.0)• Ignored DC requirement
Layer 3 : ODL Protocols
• Specialized protocol semantics for different components, e.g.:• Search component uses ODLSearch protocol
• ListRecords and ListIdentifiers embed query terms in “set” parameter
• Annotation component uses ODLAnnotate protocol
• ListRecords and ListIdentifiers specify the item for which annotations are requested in the “set” parameter
• PutRecord adds an annotation to an item
Performance Optimizations
• Caching of responses
• Persistent CGI mechanisms• FastCGI• SpeedyCGI
• Request multiple records in a single operation (proposed)
What have we accomplished ?• Complete protocol-level separation among
components within the DL
• Seamless integration with little “glue”
• Simple extensions of OAI-PMH
• Modular and portable components
• Efficient in speed – but not as efficient in storage
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Digital Library In A Box
• http://dlbox.nudl.org• Part of NSF’s National Science Digital
Library (www.nsdl.org)• Offers “Shrink-wrap” Open Digital Library
Components – Open Source Software• Users install ready-made digital library
solutions, or build their own from snap-together components.
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Next meeting in Atlanta Jan. 8, 2003
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
Outline
• Overview, Problem• Experience: Case Study Projects• Open Archives Initiative• Hussein Suleman Dissertation• DL in a Box, OCKHAM• Summary and Conclusion
Summary and Conclusion
• It is possible to build DLs easily.
• The ODL approach to this has been developed and validated in a number of settings.
• Everyone is invited to:
• Use ODL components
• Refine or add ODL components, protocols
• Join ODL and OCKHAM
• For more information see:
(Somewhat) Open Issues• Is this scalable? Portable ? Extensible ?• Can we define all popular DL services using such
a methodology? (completeness problem)• Can we define DLs as configurations of ODL
components? (composition problem)• Is OAI-PMH a good baseline protocol ? Can we
design a better baseline protocol upon which to base harvesting and repository access?
• To what degree is an ODL network equivalent to a monolithic system? (comparison problem)
Ultimate Goal• Package different configurations into
instant DL systems or subsystems
• DL building = component configuration
• All DLs speak the same language(s)
• Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs
Selected Links
• CITIDEL – www.citidel.org
• NCSTRL – www.ncstrl.org
• NDLTD – www.ndltd.org
• NSDL – www.nsdl.org• Open Archives Initiative
• www.openarchives.org• www.openarchives.org/OAI/openarchivesprotocol.htm• www.dlib.vt.edu/projects/OAI/
More Links
• Hussein Suleman’s Dissertation• http://purl.org/net/hsdiss/odl.pdf
• Repository Explorer• http://purl.org/net/oai_explorer
• DL Courseware – http://ei.cs.vt.edu/~dlib • Virginia Tech Digital Library Research
Laboratory (DLRL) – www.dlib.vt.edu• Listservs
• dl-in-a-box-l@listserv.vt.edu• ockham-sys@listserv.cc.emory.edu
Recommended