100
HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar February 6, 2012 Jeremy York, Project Librarian, HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License .

HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Embed Size (px)

Citation preview

Page 1: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

HATHITRUST A Shared Digital Repository

Your Library Now Online Putting HathiTrust in the Context of Traditional (and New) Library

ServicesMCLS Webinar

February 6 2012Jeremy York Project Librarian HathiTrust

Unless otherwise noted these slides and their contents are licensed under a Creative Commons Attribution Unported License

Outline

bull The Big Ideandash Mission and Goals

bull What wersquore doing to get therendash Repository and Contentndash Making content availablendash Organizational structure

bull How HathiTrust can change the way we work

The Big Idea

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense

de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida

University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 2: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Outline

bull The Big Ideandash Mission and Goals

bull What wersquore doing to get therendash Repository and Contentndash Making content availablendash Organizational structure

bull How HathiTrust can change the way we work

The Big Idea

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense

de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida

University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 3: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

The Big Idea

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense

de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida

University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 4: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense

de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida

University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 5: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 6: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

The Name

bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 7: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Mission

bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 8: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 9: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 10: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

What we are doing to get there

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 11: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Cost-effective long-term preservation and access for digitized content

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 12: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Facilitate decision-making about digitization and print collection management

bull Facilitate activities such as discovery copyright review use of materials

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 13: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Repository and Content

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 14: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Michigan 43

California 32

Wisconsin 5

Cornell 4NYPL 2

Princeton 2Indiana 2

Columbia 1

Harvard 2LoC 1 Madrid 1 Minnesota 1

Illinois 1

Content Sources

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 15: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

English48

German9

French7

Spanish5

Chinese4

Russian4

Japanese3

Italian2

Arabic2

Latin1

Remaining Languages

14

Language Distribution (1)

The top 10 languages make up ~86 of all content

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 16: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Undetermined7

Polish7

Portuguese7

Dutch5

Hebrew5

Hindi5

Indonesian4

Korean4Swedish

3

Urdu3

Turkish3

Thai3Danish

3

Czech3

Unknown3

Croatian2

Persian2

Tamil2

Bengali2

Music2

Hungarian2

Norwegian2

Sanskrit2

Vietnamese1

Ukrainian1

Greek1

Bulgarian1Serbian

1Armenian

1Romanian

1Marathi

1

Ancient-Greek1 Panjabi

1

Telugu1Malay

1

Catalan1

Malayalam1

Multiple1

Finnish1

Slovak1

Language Distribution (2)

The next 40 languages make up ~13 of total

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 17: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Dates

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 18: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Copyright Distribution

In-copyright or unde-termined

69

Public Domain (worldwide)

15

US Federal Government Documents (worldwide)

4

Public Domain(US)11

Open Access1

Creative Commons 04

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 19: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

1010

8

110

9

410

9

710

9

1010

9

111

0

411

0

711

0

1011

0

111

1

411

1

711

1

1011

1

111

2

411

2

711

2

1011

2

111

30

10

20

30

40

50

60

70

80

90

100

Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 20: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 21: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 22: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsTDR

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 23: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 24: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 25: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 26: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 27: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

We engage in preservationfor purposes of access

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 28: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 29: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Making Content Available

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 30: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 31: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 32: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 33: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 34: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 35: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 36: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 37: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 38: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 39: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 40: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 41: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 42: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Descriptive headings added (hidden from GUI with CSS)

Info about SSD service amp link to accessibility page

Images used for style are in css so no need to use alt tags

Skip navigation link

Access keys for navigating pages with keyboard

Added labels amp descriptive titles to forms amp ToC table

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 43: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 44: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 45: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 46: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access

Catalog

Full-text Search

PageTurner

APIs

Collections

Datasets

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 47: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

APIs

bull Data APIndash Volume and rights informationndash Page imagesndash OCR

bull Bibliographic APIndash Volume and rights informationndash MARC records

bull OAIbull ldquoHathifilesrdquo

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 48: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Datasets

bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement

bull Non-Google-digitized ~370000 texts Freely available Statement on management

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 49: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Research Center

bull Environment to perform research on HathiTrust corpus

bull httpwwwhathitrustorghtrc

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 50: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull httplibumichedumpachbull Package of tools to enable publication of open

access born-digital journal content directly into HathiTrustndash Including accompanying data and media files

bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 51: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Source Archive

Editorial Market

Higher Education

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 52: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Access Determinations

bull Automatedbull Manual

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 53: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Automatic Rights Determination

bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide

bull US works published before 1923 US federal government publications non-US works published prior to 1873

ndash Public domain in the United Statesbull Non-US works published prior to 1923

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 54: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Manual Rights Determination

bull IMLS-funded CRMS projectndash CRMS-US

bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions

ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions

ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644

openedbull Rights Holder Permissions

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 55: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull System of Precedence

Rights Database

Bibliographic (automatic)

Manual

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 56: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Lawful uses

bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently

owned (or owned previously) by the partner institution

ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 57: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Lawful uses (2)

bull Out of print and brittle missingndash Works must be currently owned (or owned

previously) by the partner institutionndash Must be authenticated or accessing work from

library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle

bull Access and use statementsndash httpwwwhathitrustorgaccess_use

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 58: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 59: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

e-Commerce

Print on Demand

Content Ingest

Transformation

Validation

Content Access

PageTurner

Collection Builder

Large-scale Search

Bibliographic Catalog

Research Center

APIs

Quality Assurance

Quality Review

Content Certification

User Services

Usability

User support (helpdesk)

Outreach

Project website

Monthly newsletter

Papers and presentations

Communication with potential

partners

Surveys general inquiries

Repository evaluation and

audit (eg DRAMBORA

TRAC)

Legal

Risk management (use of materials)

Partner agreements

Advocacy

Governance

Budget Finances

Decision-making

Policy

Planning

Enterprise Management

Communication and Coordination

with partner institutions

Project management

Repository Administration

Hardware configuration and

maintenance

Web and application server configuration and

maintenance

Security

Permissions

Logging

Repository Administration

Data management (content storage backup integrity checks deletion)

Hardware selection and replacement

Content and Metadata

specifications

Disaster Recovery

Processes for ensuring content

integrity

Rights Management

Copyright determination

Copyright review

Copyright information

management (database)

Rightsholder permissions

Bibliographic Data

Management

Entity description (record-level)

Object identification (item-

level)

Data availability

Collection Development

Digitalbull Expansion beyond

books and journals (born-digital images and maps audio)

bull Selection of content (for non-Google volume ingest and pilots projects)

Printbull Cloud Library (effect

of digital on print)

Financial contributions of partners

HathiTrust Functional Framework

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 60: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

HathiTrust

Strategic Advisory BoardBudgetFinances Decision-making

Guidance on Policy Planning

bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications

PageTurner Bibliographic Data Management

Executive Committee

Collective Work Working Groups and Committees

Operationalbull Communicationsbull User Supportbull User Experience

Operationalbull Communicationsbull User Supportbull User Experience

Strategicbull Collectionsbull Discovery Interfacebull Full-text Search

Distributed work

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 61: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Constitutional Convention

bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals

ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 62: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

HathiTrust

Executive Committee

Strategic Advisory

Board

BudgetFinancesDecision-making

Guidance on Policy Planning

bull 12-member Board of Governors

bull Chief Executive Officer bull Executive Committee

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 63: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Governance

bull Efficient practicalbull Inclusive collective

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 64: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Outline

bull The Big Idea ndash Mission and Goals

bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure

bull How HathiTrust can change the way we work

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 65: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

How HathiTrust Can Change the Way We Work

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 66: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Seeing collective problems as collective

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 67: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Breakdown of HathiTrust book corpus by publication date

Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011

42

19

20

19

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 68: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Breakdown of HathiTrust book corpus by publication date

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 69: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 70: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Copyright status of books published pre-1923 and US works published 1923-1963

42

19

20

19

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 71: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Copyright status of books published pre-1923 and US works published 1923-1963

In Print 42

19

20

19

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 72: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Identificationbull Descriptionbull Rights

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 73: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic records

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 74: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 75: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objects

Relationships

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 76: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Identificationbull Descriptionbull Rightsbull Relationships

ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print

Relationships

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 77: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Understanding the relationship between the collective and local

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 78: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

1st model Price per GB

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 79: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566

Public Domain 372085 758947 1959223 2712626 3218132

2008 2009 2010 2011 2012 (Oct)0

2000000

4000000

6000000

8000000

10000000

12000000

Total VolumesPublic Domain

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 80: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

0 20 40 60 80 100 1200

10

20

30

40

50

60

Rank in 2008 ARL Investment Index

o

f Tit

les

in L

ocal

Col

lecti

on

A global change in the library environment

June 2010Median duplication 31

June 2009Median duplication 19

Academic print book collection already substantially duplicated in mass digitized book corpus

Courtesy of Constance Malpas OCLC Research

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 81: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Digitized Books in Shared Repositories

Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100

500000

1000000

1500000

2000000

2500000

3000000

3500000

Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories

Uni

que

Titl

es

~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories

~35M titles

~25M

Courtesy of Constance Malpas OCLC Research

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 82: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Collection Overlap

bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges

bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-

duplicationndash Facilitate individual and collaborative collection

development and management operationsbull Print monographs archiving

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 83: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Sourcing and Scalinghttporweblogoclcorgarchives002058html

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 84: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Scalendash Institution-scalendash Group-scalendash Web-scale

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 85: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

bull Sourcingndash Institutionalndash Collaborativendash Third-party

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 86: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

A new kind of library

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 87: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more
Page 88: HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • Your Library Now Online Putting HathiTrust in the Context of
  • Outline
  • The Big Idea
  • Partnership
  • Digital Repository
  • The Name
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • What we are doing to get there
  • Slide 11
  • Slide 12
  • Repository and Content
  • Content Sources
  • Language Distribution (1)
  • Language Distribution (2)
  • Dates
  • Copyright Distribution
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Making Content Available
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • APIs
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Datasets
  • Research Center
  • Slide 54
  • Slide 55
  • Access Determinations
  • Automatic Rights Determination
  • Manual Rights Determination
  • Rights Database
  • Lawful uses
  • Lawful uses (2)
  • Outline (2)
  • Slide 63
  • Slide 64
  • Constitutional Convention
  • Slide 66
  • Governance
  • Outline (3)
  • How HathiTrust Can Change the Way We Work
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Relationships
  • Relationships (2)
  • Relationships (3)
  • Relationships (4)
  • Relationships (5)
  • Slide 81
  • Slide 82
  • Slide 83
  • A global change in the library environment
  • Digitized Books in Shared Repositories
  • Collection Overlap
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Thank you
  • How to find out more