32
The Technical Infrastructure of the NSDL Dean Krafft, Cornell University [email protected]

The Technical Infrastructure of the NSDL Dean Krafft, Cornell University [email protected]

Embed Size (px)

Citation preview

Page 1: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

The Technical Infrastructure of the NSDL

Dean Krafft, Cornell [email protected]

Page 2: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL Technical Overview

Structure of the talk: NSDL 1.0 Overview The Fedora-based NSDL Data Repository

(NDR) and NSDL 2.0 Inspiring Contribution and Collaboration –

ExpertVoices Other NSDL 2.0 Services and Tools Q&A

Page 3: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

What is the NSDL?

An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education

A digital library describing nearly two million carefully selected online STEM resources from well over 100 collections (at http://nsdl.org)

A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees

A large community of researchers, librarians, content providers, developers, students, and teachers

Page 4: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu
Page 5: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL 1.0

Create a “union catalog” of Dublin Core metadata records for STEM resources

Harvest those records from collections using OAI-PMH (openarchives.org)

Store records in an Oracle DB and re-serve qualified DC through OAI-PMH

Build a search index using metadata plus full-text of available content pages

Create a web portal at nsdl.org for K-gray access to NSDL resources

Page 6: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Infrastructure overview: NSDL 1.0

STEMCollectionson the Web

CentralMetadata

Repository

SearchService

ArchiveService

Collection RegistrationSystem

NSDL.org Portal

Protocol:OAI-PMHHTTPRESTSQL

Page 7: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL 1.0 Lessons

Metadata Repository was quick to implement using known technologies, but

Limited model Metadata-centric orientation No content – only metadata Limited relationships – collection/item Limits on context, structure, and access Severe limits on contribution and collaboration One-way data flow: NSDL → Users

Rather than one portal for everyone, support communities with common interests: Pathways now provide discipline and area-specific portals

Page 8: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL 2.0

Create an NSDL that guides not just resource discovery, but resource selection, use, organization, annotation and contribution Supports creating “context” for resources Presents resources in context: linked to related

concepts; with user ratings; with codes and data Supports creating a permanent archive of resources Enables community tools for structuring, evaluation,

annotation, contribution, collaboration Provides two-way data flow: NSDL ↔ users

Goal: Create a dynamic, living library

Page 9: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Creating the NSDL Data Repository

Supports storing both content and metadata

Allows arbitrary relationships among resource and metadata objects: organization, annotation, citation

Accessible through web service architecture of remixable data sources and transformations

Page 10: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Fedora: the NDR middleware A Flexible, Extensible Digital Object

Repository Architecture (http://www.fedora.info)

Open source project with $2.2 million in Mellon funding 2002-2007

Collaboration of Cornell and Univ. of Virginia Key funded users include:

eSciDoc project (collaboration of the Max Planck Society and FIZ Karlsruhe)

Public Library of Science (Topaz Foundation) VTLS Corp., Harris Corp., Library of Congress Australian Research Repositories Online to the

World Royal Library Denmark, National Library, and DTU

Page 11: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

What is Fedora? An architecture, toolkit, and implementation:

middleware, not a vertical application DSpace in contrast: a vertical application

with a fixed workflow targeted at users Stores arbitrary internal and external digital

objects, disseminations (transformations and combinations), relationships among objects

Entirely SOAP/REST based, disseminations are URLs

XML data store; RDBMS cache; RDF triplestore supports relationship queries

Page 12: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL Data Repository (NDR) References to roughly 2 million

selected STEM resources on the web Sourced metadata statements about

those resources A REST API to allow authenticated

access by Pathways and providers Support for annotation, aggregation,

and other relationships

Page 13: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Sample NDR Objects & Relationships

PublicationResource

Data SetMetadata

PublicationMetadata

Data SetResource

CodeResourceCites

Metadata for

Member of

MetadataProvider MatForge

CollectionSoft MatterCollection

Member of

Cites

Metadata for

CornellCCMR

MatDLPathway Selector

forSelectorfor

Page 14: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

An Information Network Overlay Think of the NDR as a lens for viewing

science content on the net Content can be:

Local: stored directly in the NDR Remote: accessed through a URL Computed: derived from a database or

web service Archived: an older version stored at SDSC

It all has a repository-based URL

Page 15: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Network Overlay View

User View

API/UI

Repository View with Relations & Annotations

Resources on the Web

Page 16: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

How should we use the NDR? The NDR provides powerful capabilities

for: Creating context around resources Enabling the NSDL community to directly

contribute resources and context Representing a web of relationships among

science resources and information about those resources

How do we use it? Here’s one specific example …

Page 17: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Soft Matter Wiki: Planned NDR Integration Community of approved contributors (e.g.

teachers, librarians, materials scientists) are granted edit access to Soft Matter wiki

New resources and metadata are created as wiki pages and reflected into the NDR

Relevant non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking

User and project pages organize NDR resources

Will work with MatDL on integrating these capabilities into Soft Matter Wiki

Page 18: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NDR Entry for Soft Matter Wiki

Wiki Entry

NewMetadata

NewAudience

MD

ReferencedNew

Resource 1

ReferencedExisting

Resource 2

Annotates

Metadata for

Metadata for

Member ofMetadataProvider

MetadataProvider

ExistingCollection

Soft MatterWiki

Member of

Inferred relationshipbetween resources

Page 19: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu
Page 20: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

But an NDR-integrated wiki is just the beginning …

Page 21: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Expert Voices A system using blogging technology to:

Support STEM conversations among scientists, teachers and students

Tie NSDL resources to real-world science news Create context for resources to enhance

discovery, selection and use Enable NSDL community members to become

NSDL contributors: of resources, questions, reviews, annotations, and metadata

Expert Voices ≠ LiveJournal Contributors are carefully selected,

contributions are about science, the process of science, and education

Page 22: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Expert Voices Implementation

Open source multi-user blogging system Published entries become NSDL resources Owner controls publication of entries and

visibility of comments Entries can contain linked references to NSDL

resources, references to URLs that should become resources, and new resource metadata

Integrated with NSDL Shibboleth-based community sign-on

Page 23: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu
Page 24: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

MyNSDL: NDR-integrated tagging, bookmarking, and recommendation Based on Connotea open-source

folksonomic tagging/bookmarking system

Tags and bookmarking structure are reflected back into the NDR

Authorized users can “automatically” recommend new NSDL resources simply by tagging them

Gives user a personal view of NSDL resources

Page 25: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu
Page 26: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Other proposed applications iVia-based Expert-Guided crawl: Tool

for Pathways and others to turn websites into resource collections (in development at UC Riverside)

Moodle Course Management System – courses integrated with NSDL resources

Electronic lab notebook – integrating lab notes with code, data sets, and reference materials within the library archival framework

Page 27: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

NSDL 2.0 Ecosystem

Protocol:OAI-PMHHTTPRESTNDR API

STEMCollections

SearchServiceArchive

Service

Fedora-basedNDR

Page 28: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

What does this mean for the user?

NSDL 2.0 applications situate resources in context, aiding both discovery and use

Users become contributors, adding new resources, ratings, annotations, and organizational structure – frequently as a side effect of using the library

Specialized portals, tagging, and powerful relationship queries and filtering support user-specific “views” into the library

Page 29: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Summary

NSDL 1.0 created a large, production digital library of STEM resources for education.

NSDL 2.0 and its tools allow scientists, mathematicians, teachers, engineers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.

Page 30: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Acknowledgements

NSDL NSF Program Officers Lee Zia David McArthur

NSDL Core Integration Team UCAR: Kaye Howe, PI and Executive Director Cornell: Dean Krafft, PI Columbia: Kate Wittenberg, PI

Fedora Development Team Cornell: Sandy Payette & Carl Lagoze Univ. of Virginia: Thornton Staples

Page 31: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Questions?

Page 32: The Technical Infrastructure of the NSDL Dean Krafft, Cornell University dean@cs.cornell.edu

Contact Information

Dean B. KrafftCornell Information Science301 College Ave.Ithaca, NY [email protected]

This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.