36
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing Research Data for Re-use” August 1, 2013 Jeremy York, Project Librarian, HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License .

HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Embed Size (px)

Citation preview

Page 1: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

HATHITRUST A Shared Digital Repository

HathiTrust Key Concepts and Issues in Managing the Digital

Archive

ICPSR Summer WorkshopldquoCurating and Managing Research Data for Re-userdquo

August 1 2013Jeremy York Project Librarian HathiTrust

Unless otherwise noted these slides and their contents are licensed under a Creative Commons Attribution Unported License

Outline

bull What is HathiTrust What are we trying to accomplish

bull Repository managementndash What keeps us running

bull Assessment

What is HathiTrust

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityBrown UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityTufts UniversityUniversidad Complutense

de MadridUniversity of AlbertaUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of Delaware

University of FloridaUniversity of HoustonUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of OklahomaUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 107 million total volumes ndash 56 million book titlesndash 281000 serial titlesndash 34 million public domain (~31)

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 2: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Outline

bull What is HathiTrust What are we trying to accomplish

bull Repository managementndash What keeps us running

bull Assessment

What is HathiTrust

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityBrown UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityTufts UniversityUniversidad Complutense

de MadridUniversity of AlbertaUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of Delaware

University of FloridaUniversity of HoustonUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of OklahomaUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 107 million total volumes ndash 56 million book titlesndash 281000 serial titlesndash 34 million public domain (~31)

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 3: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

What is HathiTrust

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityBrown UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityTufts UniversityUniversidad Complutense

de MadridUniversity of AlbertaUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of Delaware

University of FloridaUniversity of HoustonUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of OklahomaUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 107 million total volumes ndash 56 million book titlesndash 281000 serial titlesndash 34 million public domain (~31)

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 4: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityBrown UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of

TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central

University

North Carolina StateUniversity

Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State

UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityTufts UniversityUniversidad Complutense

de MadridUniversity of AlbertaUniversity of ArizonaUniversity of CalgaryUniversity of California

BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz

The University of ChicagoUniversity of ConnecticutUniversity of Delaware

University of FloridaUniversity of HoustonUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North

Carolina at Chapel HillUniversity of Notre DameUniversity of OklahomaUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-

MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 107 million total volumes ndash 56 million book titlesndash 281000 serial titlesndash 34 million public domain (~31)

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 5: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Digital Repository

bull Launched 2008bull Initial focus on digitized book and journal

contentndash 107 million total volumes ndash 56 million book titlesndash 281000 serial titlesndash 34 million public domain (~31)

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 6: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Mission

To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 7: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Universal Library

Common Goal

Single Entity Many Partners

HathiTrust

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 8: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Collections and Collaboration

bull Comprehensive collection- Preservationhellipwith Access

bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services

bull Public Good

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 9: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Repository Management

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 10: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Underlying ideas

bull Communitybull Scalebull Access and Preservationbull Openness

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 11: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Community

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 12: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Community

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 13: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Community

bull OAISbull TRACbull METS and PREMISbull Repository Practices

ndash Content packagendash Validationndash Identificationndash Scale

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 14: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Scale

bull Missionndash To contribute to the common good by collecting

organizing preserving communicating and sharing the record of human knowledge

bull Strategyndash ldquoCo-owned and managedrdquo

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 15: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Preservation and Access

bull ldquoLightrdquo archive benefitsndash Access to materialsndash Checks on integrityndash Best chance for content to be used and valued

preserved

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 16: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Openness

bull Repository centralizedopenbull Formatsbull Softwarebull Organizational structure

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 17: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Underlying ideas

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 18: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Underlying ideas

Experience

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 19: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Repository PhilosophyDesign

bull OAISTRAC

bull Consistency

bull Standardization

bull Simplicity (in design not function)

bull Practicality

bull Sustainability

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 20: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 21: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Source

Bibliographic Data

Content Package

MichiganIndiana

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

Datasets

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 22: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Content

bull Types and number of formatsndash ITU G4 TIFFndash JP2ndash Unicode (with and without coordinates)

bull Open meet community standardsbull Widely supported on a number of platformsbull Confidence in preservation and migrationbull Transform to access formats

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 23: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Content Package

imagesSource METStext

HTMETS

Zip

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 24: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 25: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 26: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Storage

bull Reliability ndash ensure integritybull Redundancy ndash in single and multiple sitesbull Scalability ndash including ease of managementbull Accessibility ndash for repository processes and servicesbull Platform-independence ndash for dataobject management

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 27: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Architecture amp Management

imagesSource METStext

HTMETS

uc1pairtree_rootb3543486b34543486

b34543486zip

b34543486metsxml

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 28: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Source

Bibliographic Data

Content Package

Bib Data

Data Management

Rights Data

Storage

Access

Ingest

Catalog

Full-text Search

PageTurner

APIs

Collections

Holdings Data

DatasetsMichigan

Indiana

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 29: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Assessment

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 30: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

CRL Audit

bull Whyndash Value Community Standardsndash Accountability Openness Transparency

bull Desire to know how we were doing and let the community know

bull Auditndash Guided by criteria included in TRAC as well as other

metrics developed by CRLndash HathiTrustrsquos practices are soundhellipappropriate to the

content being archived and the general needs of the CRL community

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 31: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

What was involved

bull Timelinendash Data gathering November 2009 - December 2010ndash Site visit May 2010ndash Results in March 2011

bull Logisticsndash Question by email documentationndash Phone conversationsndash Staff Project Librarian Digital Preservation

Librarian Executive Director

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 32: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Results

bull Organizational Infrastructure (2)ndash Mission statement succession plan staff assessment

accountability business plan agreementsbull Digital Object Management (3)

ndash Properties preserved SIP AIP validation naming conventions identifiers understandability preservation strategies logging access policies

bull Technologies Technical Infrastructure Security (4)ndash Hardware software error-handling change

management security staff roles disaster preparedness

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 33: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Key Issues

bull Rights and ownership of HathiTrust enterprise assets

bull Succession planbull Clarify and strengthen quality assurance and

print archiving components of the HathiTrust program

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 34: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Future Work

bull Disaster Recoverybull Change Management

ndash Moving to new formats image audio born-digitalbull Certification updates

bull Documentationndash httpwwwhathitrustorgtrac

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 35: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

Thank you

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more
Page 36: HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing

How to find out more

bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter

ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss

bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs

ndash Large-scale Searchndash Perspectives from HathiTrust

  • HathiTrust Key Concepts and Issues in Managing the Digital Arc
  • Outline
  • What is HathiTrust
  • Partnership
  • Digital Repository
  • Mission
  • HathiTrust
  • Collections and Collaboration
  • Repository Management
  • Underlying ideas
  • Community
  • Community (2)
  • Community (3)
  • Scale
  • Preservation and Access
  • Openness
  • Underlying ideas (2)
  • Underlying ideas (3)
  • Repository PhilosophyDesign
  • Slide 20
  • Slide 21
  • Content
  • Content Package
  • Slide 24
  • Slide 25
  • Storage
  • Architecture amp Management
  • Slide 28
  • Assessment
  • CRL Audit
  • What was involved
  • Results
  • Key Issues
  • Future Work
  • Thank you
  • How to find out more