15
The Data Management Requirements at SNS Shelly Ren & Steve Miller Scientific Computing Group, SNS-ORNL December 11, 2006

The Data Management Requirements at SNS Shelly Ren & Steve Miller Scientific Computing Group, SNS-ORNL December 11, 2006

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

The Data Management Requirements at SNS

Shelly Ren & Steve Miller

Scientific Computing Group, SNS-ORNL

December 11, 2006

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

2SDM12/11/2006

SNS Neutron Scattering User Facility

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

3SDM12/11/2006

Neutron Scattering Science Areas

Chemistry – microstructures

Complex Fluids – fluid properties

Crystalline Materials – molecular structure

Disordered Materials – structure characterization

Engineering – study material stress/strain

Magnetism & Superconductivity – material properties

Polymers – studying “giant” molecules

Structural Biology - proteins

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

4SDM12/11/2006

SNS Instrument Commissioning Schedule

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

5SDM12/11/2006

0

200

400

600

800

1,000

1,200

1,400

2006 2007 2008 2009 2010 2011

GB

/da

y

reduced

raw+reducedold rawraw

YEAR

SNS Potential Data Volume

0

200

400

600

800

1,000

1,200

2006 2007 2008 2009 2010 2011

YEAR

TB

raw+reduced

old raw

raw

Total Stored Data

Production Data Rate Just Instrument Data Here

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

6SDM12/11/2006

Integrating Computation with Experimentation

Acquisition

Raw

DiagnosticsTreatment Analysis

Intermediate Scientific

Instrument

Electronic notebook

Decision Support &Intelligent Control

Sample & Environment

VisVis Vis

Controls InstrumentSimulation

Materialssimulation

Proposal

Automation

DatabaseInstrument simulation

Materials simulation

Vis Vis

Publications

interactive feedback

acq

uis

itio

n

analysis simulation

dat

a

Database

Documentation

visualization

Raw Intermediate ScientificNotebookSample & environment Simulation Simulation

Access and authorization controlControl portal Data portal Analysis portal

Web Browser

Repository

Data Software

HardwareKey

Metadata

PortalHPC Support

High Performance Computing

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

7SDM12/11/2006

Creating, Processing and Storing Data

• Event Histogramming

• Detector to Pixel mapping

• Instrument Geometry

• Metadata extraction

• Create NeXus file

Catalog and Store

Reduce Data

• All subsystems functional to some degree

Data Reduction

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

8SDM12/11/2006

Current SNS Data Hierarchy

SNS data are stored on NFS mounted file system Direct Attached Storage (DAS) - incrementally growing the

storage resources based upon need A data server for DAS - Terabytes internal hard drive storage

SNS metadata are stored in Oracle database

ICAT metadata -- Oracle DBICAT Appl Server -- JBoss

/facility/instrument/proposalID/experimentId/runNumber /Nexus/NeXus files

/preNeXus/metadata files /analysis

live-catalog

icat-search

Data Hierarchy

e.g. /SNS/BSS/2006_1_2_SCI/1/100/NeXus/BSS_100.nxs /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvinfo.xml /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvbeam.xml

user-workspace

data browsersns-checkin

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

9SDM12/11/2006

SNS Data Access – Through Unix Shell

Symbolic links are created in the user’s home directory to link to the proposal directories he/she is a member of

Symbolic links are created for the user in the users’ home directory to link to the public directory where public data reside

Disk quota may be allocated for users to perform analysis, simulation

/facility/users/neutron_boy/workspace (write) /proposalID (read only) /proposalID (read only) /public (read only)

/facility/users/public/proposalID /proposalID

User Workspace

Gray names are symbolic links to data hierarchy

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

10SDM12/11/2006

SNS Data Access – Through Portal

ISAW Plot

metadata

NeXu Files

NeXus tags

First SNS Data

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

11SDM12/11/2006

Search Your Data via the Web

• Enter search text• Select search fields• Select files of interest to browse or to download

Select Optional Search Fields

Enter Text Search String

Returns Files

Found

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

12SDM12/11/2006

Monte Carlo Simulation via the Portal

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

13SDM12/11/2006

SNS Data Management Requirements Archive, catalog and maintain data produced by SNS instruments so

users can access them from anywhere at anytime and not worry about data storage issues

Grant authorized access to SNS data and metadata for both shell and portal users (ensure data is private to the experiment team)

Provide services for efficient search, browse, download SNS data and metadata

Allow users to share datasets with their collaborators or access datasets that have been made public, in a scalable fashion

Provide data management service to HFIR, LUJAN, IPNS and other interested neutron facilities.

Extend dataset storage to spin disc, HPSS and other archival systems Manage distributed dataset storage and perform data transport for the

end users Federate data storage with partner neutron facilities like ISIS so that

the users would see all their experiment data by logging into one facility.

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

14SDM12/11/2006

SNS Long Term Data Management Needs

Create a single file hierarchy for accessing data distributed across multiple storage systems and multiple facilities even extending beyond neutron scattering facilities

Support the management, collaboration, controlled sharing, replication, transfer, and preservation of distributed data

Capture metadata for user produced data Automate data transfer Improve data processing -- parallel and scalable Search large volumes of data for patterns to find certain structures

within their data -- data mining Establish a unified user authentication service across neutron

facilities Provide users with ease of use portal service to search, browse,

download and upload data; to search, annotate, and update metadata; Integrate experiment with simulation, launch simulation jobs that

need programmatic access to the distributed data resources.

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

15SDM12/11/2006

Summary As more instruments are going through instrument commissioning

phase and diving into new science discovery era, we are facing the emerging challenge of managing the scientific data that can grow to petabytes scale in a few years

As a user facility, SNS will have a steady stream of users to run experiment, generate raw and analysis data files – we will need not only disc cache but also long term storage system like HPSS

Promise to search and retrieve SNS data and metadata for end users anywhere anytime in a timely fashion

Grow our data management resources and collaborate with the community Looking for opportunities to work with and leverage resources beyond

our facility Eager to reach out, learn and collaborate with data management experts

working on the data management discipline in all domain areas Wish to understand and utilize new software applications to manage

distributed data storage; to transport, search and retrieve data more effectively and efficiently