View
216
Download
0
Category
Tags:
Preview:
Citation preview
The Data Management Requirements at SNS
Shelly Ren & Steve Miller
Scientific Computing Group, SNS-ORNL
December 11, 2006
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
2SDM12/11/2006
SNS Neutron Scattering User Facility
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
3SDM12/11/2006
Neutron Scattering Science Areas
Chemistry – microstructures
Complex Fluids – fluid properties
Crystalline Materials – molecular structure
Disordered Materials – structure characterization
Engineering – study material stress/strain
Magnetism & Superconductivity – material properties
Polymers – studying “giant” molecules
Structural Biology - proteins
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
4SDM12/11/2006
SNS Instrument Commissioning Schedule
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
5SDM12/11/2006
0
200
400
600
800
1,000
1,200
1,400
2006 2007 2008 2009 2010 2011
GB
/da
y
reduced
raw+reducedold rawraw
YEAR
SNS Potential Data Volume
0
200
400
600
800
1,000
1,200
2006 2007 2008 2009 2010 2011
YEAR
TB
raw+reduced
old raw
raw
Total Stored Data
Production Data Rate Just Instrument Data Here
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
6SDM12/11/2006
Integrating Computation with Experimentation
Acquisition
Raw
DiagnosticsTreatment Analysis
Intermediate Scientific
Instrument
Electronic notebook
Decision Support &Intelligent Control
Sample & Environment
VisVis Vis
Controls InstrumentSimulation
Materialssimulation
Proposal
Automation
DatabaseInstrument simulation
Materials simulation
Vis Vis
Publications
interactive feedback
acq
uis
itio
n
analysis simulation
dat
a
Database
Documentation
visualization
Raw Intermediate ScientificNotebookSample & environment Simulation Simulation
Access and authorization controlControl portal Data portal Analysis portal
Web Browser
Repository
Data Software
HardwareKey
Metadata
PortalHPC Support
High Performance Computing
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
7SDM12/11/2006
Creating, Processing and Storing Data
• Event Histogramming
• Detector to Pixel mapping
• Instrument Geometry
• Metadata extraction
• Create NeXus file
Catalog and Store
Reduce Data
• All subsystems functional to some degree
Data Reduction
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
8SDM12/11/2006
Current SNS Data Hierarchy
SNS data are stored on NFS mounted file system Direct Attached Storage (DAS) - incrementally growing the
storage resources based upon need A data server for DAS - Terabytes internal hard drive storage
SNS metadata are stored in Oracle database
ICAT metadata -- Oracle DBICAT Appl Server -- JBoss
/facility/instrument/proposalID/experimentId/runNumber /Nexus/NeXus files
/preNeXus/metadata files /analysis
live-catalog
icat-search
Data Hierarchy
e.g. /SNS/BSS/2006_1_2_SCI/1/100/NeXus/BSS_100.nxs /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvinfo.xml /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvbeam.xml
user-workspace
data browsersns-checkin
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
9SDM12/11/2006
SNS Data Access – Through Unix Shell
Symbolic links are created in the user’s home directory to link to the proposal directories he/she is a member of
Symbolic links are created for the user in the users’ home directory to link to the public directory where public data reside
Disk quota may be allocated for users to perform analysis, simulation
/facility/users/neutron_boy/workspace (write) /proposalID (read only) /proposalID (read only) /public (read only)
/facility/users/public/proposalID /proposalID
User Workspace
Gray names are symbolic links to data hierarchy
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
10SDM12/11/2006
SNS Data Access – Through Portal
ISAW Plot
metadata
NeXu Files
NeXus tags
First SNS Data
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
11SDM12/11/2006
Search Your Data via the Web
• Enter search text• Select search fields• Select files of interest to browse or to download
Select Optional Search Fields
Enter Text Search String
Returns Files
Found
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
12SDM12/11/2006
Monte Carlo Simulation via the Portal
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
13SDM12/11/2006
SNS Data Management Requirements Archive, catalog and maintain data produced by SNS instruments so
users can access them from anywhere at anytime and not worry about data storage issues
Grant authorized access to SNS data and metadata for both shell and portal users (ensure data is private to the experiment team)
Provide services for efficient search, browse, download SNS data and metadata
Allow users to share datasets with their collaborators or access datasets that have been made public, in a scalable fashion
Provide data management service to HFIR, LUJAN, IPNS and other interested neutron facilities.
Extend dataset storage to spin disc, HPSS and other archival systems Manage distributed dataset storage and perform data transport for the
end users Federate data storage with partner neutron facilities like ISIS so that
the users would see all their experiment data by logging into one facility.
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
14SDM12/11/2006
SNS Long Term Data Management Needs
Create a single file hierarchy for accessing data distributed across multiple storage systems and multiple facilities even extending beyond neutron scattering facilities
Support the management, collaboration, controlled sharing, replication, transfer, and preservation of distributed data
Capture metadata for user produced data Automate data transfer Improve data processing -- parallel and scalable Search large volumes of data for patterns to find certain structures
within their data -- data mining Establish a unified user authentication service across neutron
facilities Provide users with ease of use portal service to search, browse,
download and upload data; to search, annotate, and update metadata; Integrate experiment with simulation, launch simulation jobs that
need programmatic access to the distributed data resources.
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
15SDM12/11/2006
Summary As more instruments are going through instrument commissioning
phase and diving into new science discovery era, we are facing the emerging challenge of managing the scientific data that can grow to petabytes scale in a few years
As a user facility, SNS will have a steady stream of users to run experiment, generate raw and analysis data files – we will need not only disc cache but also long term storage system like HPSS
Promise to search and retrieve SNS data and metadata for end users anywhere anytime in a timely fashion
Grow our data management resources and collaborate with the community Looking for opportunities to work with and leverage resources beyond
our facility Eager to reach out, learn and collaborate with data management experts
working on the data management discipline in all domain areas Wish to understand and utilize new software applications to manage
distributed data storage; to transport, search and retrieve data more effectively and efficiently
Recommended