Cloud Computing in Libraries and Web-scale Library Management and Discovery

Preview:

DESCRIPTION

Marshall Breeding Independent Consultant, Author, Speaker Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding. Cloud Computing in Libraries and Web-scale Library Management and Discovery. 15 March 2013. SENYLRC. Abstract. - PowerPoint PPT Presentation

Citation preview

Cloud Computing in Libraries and Web-scale Library Management and Discovery

Marshall BreedingIndependent Consultant, Author, SpeakerFounder and Publisher, Library Technology Guideshttp://www.librarytechnology.org/http://twitter.com/mbreeding

15 March 2013 SENYLRC

AbstractThis is an introduction to the concepts of cloud computing and how this suite of technologies is positioned to re-shape the way’s that libraries make use of strategic applications such as discovery and management applications. The instructor will describe the evolution of discovery systems from next-generation library catalogs that provided some improvements in the interfaces and performance of the established online catalogs toward the current wave of index-based or Web-scale discovery services. Major changes are also underway in the applications that libraries use to manage their operations and collections, with a new slate of library services platforms coming on the scene, providing an alternative to the integrated library systems that have been available for many decades.

Cloud Computing for Libraries

Volume 11 in The Tech Set

Published by Neal-Schuman / ALA TechSource

ISBN: 781555707859

http://www.neal-schuman.com/ccl

Book Image Publication Info:

Appropriate Automation Infrastructure

Current automation products out of step with current realities

Majority of library collection funds spent on electronic content

Majority of automation efforts support print activities

New discovery solutions help with access to e-content

Management of e-content continues with inadequate supporting infrastructure

Key Context: Libraries in Transition

Academic Shift from Print > Electronic E-journal transition largely complete Circulation of print collections slowing E-books now in play (consultation > reading)

All libraries: Need better tools for access to complex multi-

format collections Strong emphasis on digitizing local collections Demands for enterprise integration and

interoperability

Key Text: Changed expectations in metadata management Moving away from individual record-by-record creation Life cycle of metadata

Metadata follows the supply chain, improved and enhanced along the way as needed

Manage metadata in bulk when possible E-book collections

Highly shared metadata E-journal knowledge bases, e.g.

Great interest in moving toward semantic web and open linked data Very little progress in linked data for operational systems AACR2 > RDA MARC > Bibframe (http://bibframe.org/)

Fundamental technology shift Mainframe computing Client/Server Web-based and Cloud Computing

http://www.flickr.com/photos/carrick/61952845/http://soacloudcomputing.blogspot.com/2008/10/cloud-computing.html

http://www.javaworld.com/javaworld/jw-10-2001/jw-1019-jxta.html

Local Computing Traditional model Locally owned and managed Shifting from departmental to enterprise Departmental servers co-located in

central IT data centers Increasingly virtualized

Virtualization The ability for multiple

computing images to simultaneously exist on one physical server

Physical hardware partitioned into multiple instances using virtual machine management tools such as VMware

Applicable to local, remote, and cloud models

Cloud Computing Major trend in Information Technology Term “in the cloud” has devolved into

marketing hype, but cloud computing in the form of multi-tenant software as a service offers libraries opportunities to break out of individual silos of automation and engage in widely shared cooperative systems

Opportunities for libraries to leverage their combined efforts into large-scale systems with more end-user impact and organizational efficiencies

Beyond “Cloudwashing” Cloud as marketing hype Cloud computing used very freely,

tagged to almost any virtualized environment

Any arrangement where the library relies on some kind of remote hosting environment for major automation components

Includes almost any vendor-hosted product offering

Example: ASP now Software-as-a-Service

Cloud computing – characteristics

Web-based Interfaces Externally hosted Pricing: subscription or utility Highly abstracted computing model Provisioned on demand Scaled according to variable needs Elastic – consumption of resources can

contract and expand according to demand

Gartner Hype Cycle 2009

Gartner Hype Cycle 2010

Gartner Hype Cycle 2011

Gartner Hype Cycle 2012

Budget Allocations

Server Purchase Server

Maintenance Application

software license Data Center

overhead Energy costs Facility costs

Annual Subscription Measured

Service? Fixed fees

Factors Hosting Software Licenses Optional modules

Local Computing Cloud Computing

Infrastructure-as-a-service Provisioning of Equipment Servers, storage

Virtual server provisioning Examples:

Amazon Elastic Compute Cloud (EC2) Amazon Simple Storage Service (S3) Rackspace Cloud (

http://www.rackspacecloud.com/) EMC2 Atmos (http://www.atmosonline.com/)

Amazon EC2 Amazon Machine Instances (AMI)

Red Hat Enterprise Linux Debian Fedora Ubuntu Linux Open Solaris Windows Server 2003/2008

Storage-as-a-Service Provisioned, on-demand storage Bundled to, or separate from other cloud

services

Software as a Service Multi Tennant SaaS is the modern

approach One copy of the code base serves multiple

sites Software functionality delivered entirely

through Web interfaces No workstation clients

Upgrades and fixes deployed universally Usually in small increments

Data as a service SaaS provides opportunity for highly shared

data models Bibliographic knowledgebase: one globally

shared copy that serves all libraries Discovery indexes: article and object-level index

for resource discovery E-resource knowledge bases: shared

authoritative repository of e-journal holdings General opportunity to move away from library-

by-library metadata management to globally shared workflows

Software-as-a-Service Complete software application, customized for

customer use Software delivered through cloud infrastructure,

data stored on cloud Eg: Salesforce.com—widely used business

infrastructure Multi-tenant: all organizations that use the

service share the same instance (codebase, hardware resources, etc) Often partitioned to separate some groups of

subscribers

Application service provider Legacy business applications hosted by software

vendor Standalone application on discrete or virtualized

hardware Staff and public clients accessed via the Internet Same user interfaces and functionality as if

installed locally Established as a deployment model in the 1990’s Can be implemented through Infrastructure-as-a

Service Individual instances of legacy system hosted in EC2

ASP vs SaaS

From: THINKstrategies: CIO’s Guide to Software-as-a-Service

Platform-as-a-Service Virtualized computing environment for

deployment of software Application engine, no specific server

provisioning Examples:

Google App Engine SDKs for Java, Python

Heroku: ruby platform Amazon Web Service

Library Specific platforms WorldShare Platform

Library Context

Cloud Computing

Library automation through SaaS Almost all library automation products

offered through hosted options SaaS or ASP?

ILS Products offered as SaaS (mostly ASP)

SirsiDynix Symphony SirsiDynix Horizon Innovative Interfaces Millennium Ex Libris Aleph EOS International EOS.Web Evergreen – Equinox Software Koha – LibLime, ByWater, many others

internationally …many other examples …

Multi-tenant SaaS Serials Solutions

Summon Intota (Announced for 2012-13) 360 Search, 360 Link, KnowledgeWorks

Ex Libris Alma Primo Central

BiblioCommons OCLC WorldShare Management Services

Platform as a Service OCLC WorldShare Platform

WorldShare Management Services WorldShare License Manager Library-created applications

Library Management in the Cloud Almost all library automation vendors offer

some form of “cloud-based” services Server management moves from library to

Vendor Subscription-based business model Comprehensive annual subscription

payment Offsets local server purchase and maintenance Offsets some local technology support

Leveraging the Cloud Moving legacy systems to hosted

services provides some savings to individual institutions but does not result in dramatic transformation

Globally shared data and metadata models have the potential to achieve new levels of operational efficiencies and more powerful discovery and automation scenarios that improve the position of libraries overall.

Transition to Web-scale Technologies

Web-scale: a characterization or marketing tag that denotes a comprehensive, highly-scalable, globally shared model

Web-scale: One of the key characteristics of emerging library management and discovery services

Displaces applications or data models targeting individual libraries in isolation

Discovery: index-based search Management: Library Services Platforms

Repositories in the cloud Dspace – institutional repository

application Fedora – generalized repository platform DuraSpace – organization now over both

Dspace and Fedora DuraCloud – shared, hosted repository

platform Pilot since 2009, production in early 2011 http://www.duraspace.org/duracloud.php

Caveats and concerns with SaaS Libraries must have adequate bandwidth

to support access to remote applications without latency

Quality of service agreements that guarantee performance and reliability factors

Configurability and customizability limitations

Access to API’s Ability to interoperate with 3rd party

applications Eg: Connect SaaS ILS with discovery

product from another vendor

Benefits of Cloud Computing

Elimination of capital expenses for equipment

Lower annual costs

Redeployment of technical staff to more meaningful activities

Higher revenues relative to software-only arrangements

Provision of infrastructure at scale with lower unit costs

Longer-term relationships with customers

Libraries Providers / Vendors

Cost implications Total cost of ownership Do all cost components result in increased or

decreased expense Personnel costs – need less technical administration Hardware – server hardware eliminated Software costs: subscription, license,

maintenance/support Indirect costs: energy costs associated with power and

cooling of servers in data center IaaS: balance elimination of hardware investments

for ongoing usage fees Especially attractive for development and prototyping

Risks and concerns Privacy of data

Policies, regulations, jurisdictions Ownership of data

Avoid vendor lock-in Integrity of Data

Backups and disaster recovery

Security issues Most providers implement stronger

safeguards beyond the capacity of local institutions

Virtual instances equally susceptible to poor security practices as local computing

Cloud computing trends for libraries Increased migration away from local

computing toward some form of remote / hosted / virtualized alternative

Cloud computing especially attractive to libraries with few technology support personnel

Adequate bandwidth will continue to be a limiting factor

Increased pressure Library automation vendors promoting

SaaS offerings Some companies already exclusively SaaS

Software pricing increasingly favorable to SaaS

Caveat technologies promoted by companies

and organizations have a vested interest in their adoption

Critically assess viability of the technology and its appropriateness for your organization

A New Generation of Resource Discovery

Next-Gen Library Catalogs

Marshall BreedingNeal-Schuman PublishersMarch 2010

Volume 1 of The Tech Set

Online Catalog

Books, Journals, and Media at the Title Level

Not in scope: Articles Book Chapters Digital objects

Scope of SearchSearch:

Search Results

ILS Data

Next-gen Catalogs or Discovery Interface

Single search box Query tools

Did you mean Type-ahead

Relevance ranked results Faceted navigation Enhanced visual displays

Cover art Summaries, reviews,

Recommendation services

Books, Journals, and Media at the Title Level

Other local and open access content

Not in scope: Articles Book Chapters Digital objects

Scope of Search

Discovery from Local to Web-scale Initial products focused on interface improvements

AquaBrowser, Endeca, Primo, Encore, VuFind, LIBERO Uno, Civica Sorcer, Axiell Arena Mostly locally-installed software

Current phase is focused on pre-populated indexes that aim to deliver Web-scale discovery Primo Central (Ex Libris) Summon (Serials Solutions) WorldCat Local (OCLC) EBSCO Discovery Service (EBSCO) Encore Synergy (no index, though)

Discovery Interface search modelSearch: Digital

Collections

ProQuest

EBSCOhost

…MLA

Bibliography

ABC-CLIO

Search Results

Real-time query and responses

ILS Data

Local Index

MetaSearch Engine

Web-scale Index-based DiscoverySearch:

Digital Collections

Web Site ContentInstitution

al Repositori

es

…E-Journals

Reference Sources

Search Results

Pre-built harvesting and indexing

Consolidated Index

ILS Data

Aggregated Content packages

(2009- present)

Public Library Information PortalSearch:

Digital Collections

Web Site ContentCommunit

yInformatio

n

…Customer-providedcontent

Reference Sources

Search Results

Pre-built harvesting and indexing

Consolidated Index

LMS Data

Aggregated Content packages

Archives

Usage-generate

dData

Customer

Profile

Web-scale Search ProblemSearch:

Search Results

Pre-built harvesting and indexing

Consolidated Index

???

Non Participating

Content Sources

Problem in how to deal with resources not provided to ingest into consolidated index

Digital Collections

Web Site ContentInstitution

al Repositori

es

…E-Journals

ILS Data

Aggregated Content packages

Discovery Products

http://www.librarytechnology.org/discovery.pl

Challenge for Relevancy Technically feasible to index hundreds of

millions or billions of records through Lucene or SOLR

Difficult to order records in ways that make sense

Many fairly equivalent candidates returned for any given query

Must rely on use-based and social factors to improve relevancy rankings

Challenges for Collection Coverage To work effectively, discovery services

need to cover comprehensively the body of content represented in library collections

What about publishers that do not participate?

Is content indexed at the citation or full-text level?

What are the restrictions for non-authenticated users?

How can libraries understand the differences in coverage among competing services?

Open Discovery Initiative NISO Work Group to Develop Standards

and Recommended Practices for Library Discovery Services Based on Indexed Search

Informal meeting called at ALA Annual 2011

Co-Chaired by Marshall Breeding and Jenny Walker

Term: Dec 2011 – May 2013http://www.niso.org/workro

oms/odi/

Balance of ConstituentsLibraries

Publishers

Service Providers

57

Marshall Breeding, Vanderbilt UniversityJamene Brooks-Kieffer, Kansas State University Laura Morse, Harvard UniversityKen Varnum, University of Michigan

Anya Arnold, Orbis Cascade AllianceSara Brownmiller, University of OregonLucy Harrison, College Center for Library Automation (D2D liaison/observer)Michele Newberry, Florida Virtual Campus

Lettie Conrad, SAGE PublicationsBeth LaPensee, ITHAKA/JSTOR/PorticoJeff Lang, Thomson Reuters

Linda Beebe, American Psychological Assoc

Aaron Wood, Alexander Street PressRoger Schonfeld, JSTOR, Ithaka

Jenny Walker, Ex Libris GroupJohn Law, Serials SolutionsMichael Gorrell, EBSCO Information Services

David Lindahl, University of Rochester (XC)Jeff Penka, OCLC (D2D liaison/observer)

ODI Project Goals: Identify … needs and requirements of the three

stakeholder groups in this area of work. Create recommendations and tools to streamline

the process by which information providers, discovery service providers, and librarians work together to better serve libraries and their users.

Provide effective means for librarians to assess the level of participation by information providers in discovery services, to evaluate the breadth and depth of content indexed and the degree to which this content is made available to the user.

New-generation Library Management

Fragmented Library Management LMS for management of (mostly) print Duplicative financial systems between library and local

government or other parent organization E-book lending platform (multiple?) Interlibrary loan (borrowing and lending) Self-service and AMH infrastructure Electronic Resource Management PC Scheduling and print management Event scheduling Digital Collections Management platforms (CONTENTdm, DigiTool,

etc.) Discovery-layer services for broader access to library collections No effective integration services / interoperability among

disconnected systems, non-aligned metadata schemes

Integrated (for print) Library System

Circulation

BIB

Staff Interfaces:

Holding / Items

CircTransact User Vendor Policies$$$

Funds

Cataloging Acquisitions Serials OnlineCatalog

Public Interfaces:

Interfaces

BusinessLogic

DataStores

LMS / ERM: Fragmented Model

Circulation

BIB

Staff Interfaces:

Holding / Items

CircTransactUserVendor Policies$$$

Funds

CatalogingAcquisitionsSerials OnlineCatalog

Public Interfaces:

Application Programming Interfaces`

LicenseManagement

LicenseTerms

E-resourceProcurement

VendorsE-JournalTitles

Protocols: CORE

Common approach for ERM

Circulation

BIB

Staff Interfaces:

Holding / Items

CircTransactUserVendor Policies$$$

Funds

CatalogingAcquisitionsSerials OnlineCatalog

Public Interfaces:

Application Programming Interfaces

Budget License Terms

Titles / Holdings

Vendors

Access Details

Comprehensive Resource Management No longer sensible to use different

software platforms for managing different types of library materials

ILS + ERM + OpenURL Resolver + Digital Asset management, etc. very inefficient model

Flexible platform capable of managing multiple type of library materials, multiple metadata formats, with appropriate workflows

Academic Libraries need a new model of library management

Not an Integrated Library System or Library Management System

The ILS/LMS was designed to help libraries manage print collections

Generally did not evolve to manage electronic collections

Other library automation products evolved: Electronic Resource Management Systems –

OpenURL Link Resolvers – Digital Library Management Systems -- Institutional Repositories

Library Services Platform Library-specific software. Designed to help libraries

automate their internal operations, manage collections, fulfillment requests, and deliver services

Services Service oriented architecture Exposes Web services and other API’s Facilitates the services libraries offer to their users

Platform General infrastructure for library automation Consistent with the concept of Platform as a Service Library programmers address the APIs of the platform to

extend functionality, create connections with other systems, dynamically interact with data

Library Services Platform Characteristics

Highly Shared data models Knowledgebase architecture Some may take hybrid approach to accommodate local

data stores Delivered through software as a service

Multi-tenant Unified workflows across formats and media Flexible metadata management

MARC – Dublin Core – VRA – MODS – ONIX Bibframe New structures not yet invented

Open APIs for extensibility and interoperability

Open Systems Achieving openness has risen as the key

driver behind library technology strategies Libraries need to do more with their data Ability to improve customer experience and

operational efficiencies Demand for Interoperability Open source – full access to internal

program of the application Open API’s – expose programmatic

interfaces to data and functionality

Consolidated indexUnified Presentation LayerSearch:

Digital Coll

ProQuest

EBSCO…

JSTOR

Other Resour

ces

New Library Management Model

`API Layer

Library Services Platform

LearningManageme

nt

Enterprise ResourcePlanning

StockManageme

nt

Self-Check /

Automated Return

Authentication

Service

Smart Cad /

Payment systems

Discovery

Service

Library Services PlatformsCategory WorldSha

re Management Services

Alma Intota Sierra Services Platform

Kuali OLE

Responsible Organization

OCLC. Ex Libris Serials Solutions

Innovative Interfaces, Inc

Kuali Foundation

Key precepts Global network-level approach to management and discovery.

Consolidate workflows, unified management: print, electronic, digital; Hybrid data model

Knowledgebase driven. Pure multi-tenant SaaS

Service-oriented architectureTechnology uplift for Millennium ILS. More open source components, consolidated modules and workflows

Manage library resources in a format agnostic approach. Integration into the broader academic enterprise infrastructure

Software model

Proprietary Proprietary

Proprietary Proprietary Open Source

Library Services PlatformsCategory WorldSha

re Management Services

Alma Intota Sierra Services Platform

Kuali OLE

Responsible Organization

OCLC. Ex Libris Serials Solutions

Innovative Interfaces, Inc

Kuali Foundation

Key precepts Global network-level approach to management and discovery.

Consolidate workflows, unified management: print, electronic, digital; Hybrid data model

Knowledgebase driven. Pure multi-tenant SaaS

Service-oriented architectureTechnology uplift for Millennium ILS. More open source components, consolidated modules and workflows

Manage library resources in a format agnostic approach. Integration into the broader academic enterprise infrastructure

Software model

Proprietary Proprietary

Proprietary Proprietary Open Source

Development / Deployment perspective

Beginning of a new cycle of transition Over the course of the next decade,

academic libraries will replace their current legacy products with new platforms

Not just a change of technology but a substantial change in the ways that libraries manage their resources and deliver their services

Traditional Proprietary Commercial ILS Aleph, Voyager, Millennium, Symphony, Polaris, BOOK-IT, DDELibra, Libra.se LIBERO, Amlib, Spydus, TOTALS II, Talis Alto, OpenGalaxy

Traditional Open Source ILS Evergreen, Koha

New generation Library Services Platforms Ex Libris Alma Kuali OLE (Enterprise, not cloud) OCLC WorldShare Management Services, Serials Solutions Intota Innovative Interfaces Sierra (evolving)

Competing Models of Library Automation

Convergence Discovery and Management solutions will

increasingly be implemented as matched sets Ex Libris: Primo / Alma Serials Solutions: Summon / Intota OCLC: WorldCat Local / WorldShare Platform Except: Kuali OLE, EBSCO Discovery Service

Both depend on an ecosystem of interrelated knowledge bases

API’s exposed to mix and match, but efficiencies and synergies are lost

How do libraries make the transition?

Migrating to the Cloud

Infrastructure Move existing applications to cloud

hosting? Infrastructure as a service Marginal gains

Create platforms designed for cloud deployment Multi-tenant software as a service

Transition of services Identify specific library services as

candidates What activities are performed by

individual libraries that could be done more effectively collaboratively

Candidate services Bibliographic support Reference / Research support Resource sharing E-resource management Resource Discovery Library Management

Organizational strategy Individual institutions make gains by

moving legacy applications to hosted services

Amplify impact as new collaborative services are built that span organizations

Partnership opportunities When to partner with existing service

providers? When to create services for a specific

country or sector?

More than a technical transition Transforming infrastructure

Transform resources Working toward shared infrastructure Identify areas where libraries can collaborate to share

resources Infrastructure transformation

Bandwidth Shared services Refocus development from stand-alone applications to

platforms Platform development APIs that allow individual libraries or campuses to consume

content or services according to local needs

New conceptual models Think beyond moving existing

functionality Re-evaluate the way that technical and

information infrastructure supports the library in its strategic services to its parent institution

Candidates for Cloud-based Services

Identify services that can be provided at the national or international level

Resource sharing Document delivery Interlibrary Loan

E-resource knowledgebase Index-based discovery

Infrastructure Robust Interconnectivity Development and support capacity Distributed data centers

Organization and personnel issues Refocus efforts of technologists and

technicians Away from redundant local

implementations Toward collaborative broad-based cross-

institutional services Deployment and maintenance of

conventional systems consumes all available resources

Library-by-library model least efficient

From software development to Platform development

Multi-tenant software as a service platforms that scale to meet the needs of the largest organizations or clusters of organizations

Consume platform services when available and appropriate

Create strategic platforms

Progressive consolidation of library services

Centralization of technical infrastructure of multiple libraries within a campus

Resource sharing support Direct borrowing among partner institutions

Shared infrastructure between institutions Examples: 2CUL (Columbia University /

Cornell University) Orbis Cascade Alliance (37 independent

colleges and universities to merge into shared LSP)

Consolidation of library automation services Centralized library services within institutions Strategically cooperate between institutions From software development to platform

development Refocus efforts of technology personnel Less attention to deployment of conventional

systems More attention on broad-based services Library-by-library automation model least

efficient

Open source and Open Access Open source development of platform

services Open source infrastructure components Open APIs to expose platform services Knowledge base components

Open access Community maintained Adequately resourced

Reassess expectations of Technology

Many previous assumptions no longer apply

Technology platforms scale infinitely No technical limits on how libraries share

technical infrastructure Cloud technologies enable new ways of

sharing metadata Build flexible systems not hardwired to

any given set of workflows

Reassess workflow and organizational options

ILS model shaped library organizations New Library Services Platforms may

enable new ways to organize how resource management and service delivery are performed

New technologies more able to support strategic priorities and initiatives

Time to engage Transition to new technology models just

underway More transformative development than

in previous phases of library automation Opportunities to partner and collaborate

Vendors want to create systems with long-term value

Question previously held assumptions regarding the shape of technology infrastructure and services

Provide leadership in defining expectations

Questions and discussion

Recommended