18
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

Embed Size (px)

Citation preview

Page 1: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

DBS/DLS Data Management and Discovery

Lee Lueking

3 December, 2006

Asia and EU-Grid Workshop1-4 December, 2006

Page 2: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 2

Contents

• Overview of DBS/DLS

• CMS Data Management Concepts

• DBS/DLS Data Discovery

Page 3: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 3

Data processing workflow• Data Management System allow to discover, access and transfer

event data in a distributed computing environment

DBS

DLS

SE

Dataset Bookeeping System:What data exists?

Data Location Service:Where does data exists?

Local filecatalog

UI

WMS

Site Local catalog:physical location of files at the site

Storage System

CE

Submissiontool

WN

Info flow

Job flow

Data flow

PhEDEx PhEDEx: take care of data transfer to reach destination site

Page 4: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 4

Use:• Distributed analysis tool (CRAB)• MC Production system • Data distribution tool (PhEDEx)• User data discovery

Dataset Bookkeeping System (DBS)

• DBS provides the means to define, discover and use CMS event data.

DBS

Data definition:• Dataset specification (content and associated metadata)• Track data provenance

Data discovery:• What data exists • Dataset organization in terms of files/fileblocks• Site independent information

Page 5: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 5

Data Location Service role

• The Data Location Service (DLS) provides the means to locate replicas of data in the distributed computing system– Maps file-blocks to storage elements (SE’s)

– Provide only names of sites hosting the data and not the physical location of constituent files at the sites

• Interactions with DLS– Insert file-blocks produced at a site (Production)

– Insert file-blocks upon data replication (PhEDEx)

– Query to locate file-blocks (CRAB, Production)

DLS

SECERN

fileblock1.....

fileblockN

SEFNAL

fileblock1.....

fileblockN

Page 6: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 6

Relationships to DBS

RunsLuminositySections

LuminosityTrigger(online)Runs

PhEDExDLSFiles

Blocks

CalibrationAlignmentConditions

DataQuality

DBS ProdAgent

RequestSystem

DiscoveryService

CRAB

Tier 0

Page 7: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 7

Concepts: Dataset

• A set of data representing a coherent sample for analysis– Primary Dataset: determined by HLT event classification or

MC production parameters– Processed Dataset: a slice of a primary dataset with a

consistent processing history. Note: May include multiple copies of some events with slight differences in processing.

– Analysis Dataset: a snapshot of a subsets of processed dataset representing a coherent sample for physics analysis

• CMSSW can produce merged data that do not conform to these classifications. We call this “fruit salad” and data of this nature is also accommodated.

Page 8: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 8

Concepts: Lumi Sections, Data Tier

• Luminosity Section– A period of approximately constant instantaneous

luminosity– Unit of accounting for integrated luminosity– Production data files will contain whole luminosity

sections• Data Tier

– A set of objects grouped together in output files– Defined by the release configuration files– Data Tier definitions may change slightly with major

releases

Page 9: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 9

DBS/DLS Data Discovery

Valentin Kuznetsov

Cornell University

http://cmsdbs.cern.ch/discovery/

Page 10: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 10

DBS/DLS Discovery Intro

•Set of tools which provide CLI and Web interfaces

•Combines information from DBS and DLS to find out your data

•Provides three avenues to data discovery:

• ‘navigator menu’ approach

• ‘keyword’ search

• ‘site’ search

http://cmsdbs.cern.ch/discovery/

Page 11: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 11

Navigator Menu

http://cmsdbs.cern.ch/discovery/

Drop down menus adjust hierarchically and quickly help you to find your data

Page 12: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 12

Keyword Search

Search keywords are evaluated as normal python expression

and they are case insensitive

http://cmsdbs.cern.ch/discovery/

Page 13: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 13

Site Search

Mostly for site managers, who wants to know which data resides on their site

http://cmsdbs.cern.ch/discovery/

Page 14: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 14

Results

Page 15: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 15

CLI interfaceshell# cvs co COMP/{DBS,DLS}; cd COMP/DLS/Client; make; cd </path>/COMP/DBS/Web/DataDiscoveryshell# . scripts/setup.shshell# ./DBSHelper.py --helpusage: DBSHelper.py [options]

options: -h, --help show this help message and exit --quiet be quiet and don't print exceptions --dict=DICT use to generate JavaScript dictionary, pass Global/All --primaryDataset=PRIMD specify primary dataset, e.g. --primaryDataset=CSA06-081-os-minbias --dataTier=DT specify Data Tier within dataset, e.g. --dataTier=RECO --app=APP specify application keys (version,family,exe), e.g. --app=CMSSW_0_8_1,Merged,cmsRun --dbsInst=DBSINST specify DBS instance to use, e.g. --dbsInst=MCLocal_1/Writer --showProcDatasets be quiet and show only processed datasets --site=SITE specify DLS site you're interesting, e.g. --site=fnal.gov --search=SEARCH specify any keywords to search your data, e.g. --search='CMSSW_0_8_1 and Merged' -v, --verbose be verboseshell# ./DBSHelper.py --dbsInst=MCLocal_2/Writer --app=CMSSW_1_0_4,Merged,cmsRun --primaryDataset=CSA06-103-os-ZMuMu0-0

Page 16: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 16

Lucene extentsion

•Lucene is a high-performance, full-featured text search engine

•Plan to use it for indexing DBS information, e.g. CMSSW configuration parameter sets for advanced searches• search with relational qualifiers for key-value pairs

• ‘google-like’ searches give rating for each found hit

• work in progress to provide an in-memory and file index

http://cmsdbs.cern.ch/discovery/

Page 17: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

3 Dec, 2006 DBS/DLS 17

Summary

• DBS is the catalog of all existing CMS data and information about how it was produced.

• DLS tells us where the data is physically located.

• A DBS/DLS Discovery system provides a convenient way to find the data you are interested in, and know where it is located.

• Note: New features and improvements are constantly being added. Your feedback to us is very useful.

Page 18: DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006

Finish