17
June 14, 2011 JCDL 2011: Big Data! Big Deal? Panel Building a Public Research Center for the HathiTrust Digital Library Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries ociate Director-Data to Insight Center, Pervasive Technology Instit Indiana University @hathitresearch | @hathitrust http://www.hathitrust-research.org

Building a Public Research Center for the HathiTrust Digital Library

Embed Size (px)

DESCRIPTION

This is a ppt by Robert H. McDonald from the panel moderated by Stephen Downie at JCDL 2011 called Big Data! Dig Deal?

Citation preview

Page 1: Building a Public Research Center for the HathiTrust Digital Library

June 14, 2011JCDL 2011: Big Data! Big Deal? Panel

Building a Public Research Center for the HathiTrust Digital Library

Robert H. McDonaldAssociate Dean for Library Technologies and Digital Libraries

Associate Director-Data to Insight Center, Pervasive Technology InstituteIndiana University

@hathitresearch | @hathitrust

http://www.hathitrust-research.org

Page 2: Building a Public Research Center for the HathiTrust Digital Library

HathiTrust Research Center (HTRC) Team

Indiana University Beth Plale – Director Robert McDonald – Executive Committee

University of Illinois Scott Poole – Co-Director John Unsworth – Executive Committee

Page 3: Building a Public Research Center for the HathiTrust Digital Library

HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving,

communicating, and sharing the record of human knowledge. Launched in October 2008

University of Michigan Indiana University

Used Google Books Repository at Michigan as Model Expanded to include content from

CIC Member Libraries UC System Libraries University of Virginia

Now includes more than 50 partner institutions and more than 8 million volumes

Page 4: Building a Public Research Center for the HathiTrust Digital Library

Towards a HathiTrust Research Center Started in response to proposed Google Settlement -

June 2009 Specific Funding set aside by Google to build a public research

center Worked to identify key stakeholders from HT institutions to

collaborate and write RFP Google Settlement in early 2011 did not stop the center

Developed specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009 HTRC RFP Working Group

RFP Released – Winter 2010

Page 5: Building a Public Research Center for the HathiTrust Digital Library

Our Collaboration

HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.

Page 6: Building a Public Research Center for the HathiTrust Digital Library

Our Mission Phase I : starting Apr 2011 and

going for 18 mos. Phase II : starting Fall 2012 and

going for … Goal: enable strong computational

research and education on a collection that has not been amenable to computational exploration EVER before!

Page 7: Building a Public Research Center for the HathiTrust Digital Library

Our Goals Maintain repository of text mining algorithms and

retrieval tools available on-line for human and programmatic discovery. Also register derived data sets, indexes, and versions in registry repository.

Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools.

Support interoperability across collections and institutions, through use of inCommon SAML identity.

Page 8: Building a Public Research Center for the HathiTrust Digital Library

Our Future Support innovation in cyberinfrastructure to deliver

optimal access and use of the HathiTrust corpus. Implement “Non-consumptive” research: a

technical and intellectual challenge Identify and host existing data analysis, text mining

and retrieval tools that are of interest to the community.  

Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.

Page 9: Building a Public Research Center for the HathiTrust Digital Library

HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive

body of published works for scholarship and education for computational research purposes.

Lightweight Organization Executive Committee

- Beth Plale, Indiana- Scott Poole, Illinois- Robert H. McDonald, Indiana- John Unsworth, Illinois

Advisory Board- TBD

HathiTrust Executive Committee Liaison- Laine Farley, California Digital Library

Page 10: Building a Public Research Center for the HathiTrust Digital Library

HathiTrust Research Center Today $250K in funding for initial 18 month startup Creating Themed Collections for early Use Cases

Astronomy – Victorian Literature - Influenza Ingest and Replication Mechanisms Between HT and HTRC

Full-text SOLR indexes Data Capsule integration Karma integration

Integration with SEASR/MEANDRE SOA services at NCSA Alignment with Bamboo Technology Project Alignment with international Google Books Research Centers

Establishing long-term non-consumptive research methodologies

Page 11: Building a Public Research Center for the HathiTrust Digital Library

HTRC Proposed Technical ArchitectureCourtesy IU Data to Insight Center – Beth Plale/Yiming Sun

Page 12: Building a Public Research Center for the HathiTrust Digital Library

Sample Public Domain Collection

Public-domain OCR Web

Access Servlet

Meandre Workbench

SEASR Infrastructure

Tag Cloud Viewer Data Flow

Book Search Interface by

Author or Title

1. User entersAuthor name or

Volume title

4. Invoke Tag Cloud service

with URL

5. Use URL to Retrieve Volume

6. OCR for volume

7. Tag Cloud

returned to user

Sample Collection

Bibliography Database

2. Query RIS for Author Name or Volume Title

3. Volume ID

JS/PHP Auto-

completer

A persistent RESTful Web Service

Organized as pairtree for demo

only

Administrator creates tag cloud viewer in advance through

SEASR

Converted from MARC to RIS

Current SEASR Integration Demo

Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun

Page 13: Building a Public Research Center for the HathiTrust Digital Library

Non-Consumptive Research TrackNo action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection.

Beth Plale(Indiana University)

Atul Prakash(University of Michigan)

Geoffrey Fox(Indiana University)

Robert H. McDonald(Indiana University)

Page 14: Building a Public Research Center for the HathiTrust Digital Library

Provision access to copyrighted content for research purpose giving researcher flexible computing resources in controlled environment

Secure Data CapsuleResearcher Access

HathiTrust Digital Library Content

• Access to HT open content indices

• Access to HT copyrighted indices

• Auditable Secure Mechanisms for legal mandated MOU based and fair-use compliance

Researcher Driven Applications for Use as

Services within the Data Capsule

• Can HTRC provide a services framework for researcher applications to run within the secure data capsule compute resources?

HTRC Managed Data-Intensive Compute

Resources

Page 15: Building a Public Research Center for the HathiTrust Digital Library

HathiTrust Research Center Events HTRC Kickoff Event at Digital Humanities

Conference 2011 Stanford University - June 20, 2011

Working on models for collaborative research AHRC/ESRC/IMLS/JISC/NEH/NSF/NOW/

SSHRC Digging into Data Round 2 http://www.diggingintodata.org/

Working on early advanced user case studies for the HathiTrust Corpus

Page 16: Building a Public Research Center for the HathiTrust Digital Library

Support and Acknowledgements

IU UITS Research Technologies National Center for Supercomputing

Applications IU Data to Insight Center iCHASS Illinois Informatics Institute Lilly Endowment, Inc. The Alfred P. Sloan Foundation

Page 17: Building a Public Research Center for the HathiTrust Digital Library

For More on HathiTrust Research CenterSee – http://www.hathitrust-research.org

Follow us @hathitresearch on twitter

Robert H. McDonald@mcdonald on [email protected]