23
UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Embed Size (px)

Citation preview

Page 1: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UT Research Data Repository

Chris Jordan

UT Research Cyberinfrastructure

Storage Committee Chair

Page 2: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Outline

• UTRC Introduction/Current Status• Research Data Requirements• Current TACC storage infrastructure (Corral)• New UTRC capabilities• External services and partnerships• Research and UTRC future

Page 3: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UT Research Cyberinfrastructure

• Collaborative effort initiated by Dr. Ken Shine, Vice Chancellor for Health

• Jay Boisseau (TACC), Brian Herman (UTHSCSA) co-chairs

• Assessment of research CI needs across system campuses

• Data Storage emerged as highest priority/biggest unmet need

Page 4: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UTRC Proposal

• Approved by UT Regents November 2010• Expanded Lonestar 4 for HPC needs• Establish dedicated 10gb research network to

all campuses• Develop replicated, 5PB Research Data

Repository

Page 5: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Storage Committee Activities

• Proposed iterative approach with pilot deployment in late 2011

• 1st half of 2011 spent on requirements and architecture development

• Released RFP in June• Vendor selected in August• Installation in October• Initial users ~December

Page 6: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Sidebar: Why “The Cloud” is not the answer

• Cloud storage costs = $1000s/TB/year• Often not as reliable as advertised (Google,

Amazon have both had major issues)• Restrictive interfaces, lack of high-

performance access• Issues with institutional control, security

integration, etc

Page 7: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Pilot UTRDR Deployment

• 5PB Raw storage in each of two installations• Main installation at TACC added to existing

data infrastructure• Mirror installation at Arlington for replication• High level of redundancy within each

installation – Power supplies to storage controllers and servers

Page 8: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Research Data Requirements

• Persistent Storage is just the beginning• High reliability/availability is key• Complex, evolving security needs• Importance of Collaboration• Data Applications and Services• Data Management and Analysis

• Also, it has to be cheap (or free)

Page 9: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Research Data Security

• HIPAA Compliance is a major goal of the UTRDR effort

• But HIPAA is just the beginning• Intellectual property and research

confidentiality issues are more fine-grained• Long-term issues of availability/usability• Tiers of access, change over time

Page 10: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Example Application Areas

• Biology– Biodiversity (natural history collections)– Phylogenetics

• Health Sciences– Medical Imaging– High-throughput sequencing

• Social Sciences– Economic and social analysis

Page 11: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

TACC Corral Architecture

• Emphasis on large-scale storage, highly flexible service infrastructure

• Fast networks and heterogeneous systems = malleable service and storage platform

• Allows integration of UTRC hardware into an existing infrastructure

• Near-transparent migration for existing users• Expansion improves reliability and availability

Page 12: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Corral Hardware and Services

• 1.2 Petabytes DataDirect SATA Disk• 16 Dell Servers• ~300 TB of heterogeneous disks and servers• High-Performance Parallel File System,

multiple databases, iRODS data management, replication to tape archive

• Multiple levels of access control• Supports almost any imaginable data need

Page 13: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

iRODS at TACC

• Distributed/Replicated data management• Corral, Ranch, and offsite storage systems• Extensible metadata support• Policy/Rule-based automation and

enforcement• Used for sophisticated data management

needs• Provides wide variety of interfaces

Page 14: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Current Corral Usage

• >30 Data Allocations & Collections• 350 Users at TACC and UT• >500 External users accessing collections• >500TB Research and Reference Data• Data of all types and disciplines:

– Plant specimens and ‘omics, MRI, GIS, Simulations, Fish and Pottery, Economics and Medicine

Page 15: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Added Capabilities w/ UTRDR

• Synchronous replication• Very high availability (weather, comet strikes)• Tiers of storage and data management• Huge performance boost (>80GB/sec)• Accessibility from all UT System campuses• HIPAA Compliance

Page 16: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UTRDR Pilot Access

• Accelerated access for early adopters• Allows us to shake out bugs, assess

readiness for production• Helps to develop requirements present and

future• Research network performance assessment• Expect to open to all UT System researchers

early 2012

Page 17: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UTRDR Long-term sustainability

• After pilot phase, storage will be free to all Pis up to some small limit (5TB?)

• Additional storage will be available for cost-recovery fee per TB

• Currently only trying to recoup costs on an annual basis

• Long-term preservation costs are TBD but are of major interest

Page 18: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Fee-based Research Storage

• 2 Major types of service:– Simple storage (iSCSI, SCP/FTP) based on per-

TB/year costs– Application services (databases, web applications,

data management, etc)

• Provides fixed, relatively low costs that can be written into grant proposals

• Can include both disk and tape + offsite storage

• Long-term model for UTRDR

Page 19: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Existing/Upcoming Partnerships

• University of Alaska• UC Berkeley• University of North Texas Libraries• Texas Digital Library• University of Florida• Indiana University• NSF XSEDE – 15 Institutions

Page 20: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

UTRC Plan 2012-2013

• Initial production in early 2012• Design assessment and adjustment based on

initial experiences• Expansion proposal mid-2012• Significant expansion likely late 2012/early

2013• Ongoing assessment and design adjustments

integral to the process

Page 21: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

TACC Storage Research

• Data upload and ingest processes• Storage reliability and management• Data Integrity/Long-term planning• Automated data management applications• Wide-area storage and replication efforts in

the NSF XSEDE project

Page 22: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Acknowledgements

• Dr. Ken Shine – UT System• Dr. Patricia Hurn – UT System• Jay Boisseau and Brian Herman• Jerry York – UTHSCSA• UTRC Storage Committee

– Brian Grimm, Kevin Granhold, Huapei Chen, Wayne Mueller, Bill Sanns

• And many, many others

Page 23: UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

Q&A