22
CDF data production models 1 Data production models for Data production models for the CDF experiment the CDF experiment S. Hou S. Hou for the CDF data production team for the CDF data production team

CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

Embed Size (px)

Citation preview

Page 1: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 1

Data production models for Data production models for the CDF experimentthe CDF experimentS. Hou S. Hou for the CDF data production teamfor the CDF data production team

Page 2: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 2

CDF collaborationCDF collaborationCollider Detector experiment at the Fermilab Tevatron collider

Study collisions of1 TeV protons with 1 TeV anti-protons

Page 3: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 3

Trigger, Data AcquisitionTrigger, Data Acquisition

Sub-detector signalstrigger

CDF detector data taking rate2005

Achieved2006

upgradeTevatron luminosity : 1.8x1032 cm-2s-1 3x1032 cm-2s-1

Level-1 acceptance : 27 kHz 40 kHzLevel-2 acceptance : 850 Hz 1 kHzEvent Builder (EVB) : 850X0.2 MB/s 500 MB/sLevel-3 acceptance : 110 Hz 150 Hz

to Tape storage rate : 20 MB/s 40 MB/s

8 data logging streams Event size : ~140 kByte Average data rate ~ 5 M events/day

Page 4: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 4

Data logging rateData logging rate

Data logging rate up to Sep 2005

1 fb-1 of data recorded

Data logging rate increase w. luminosity of proton, anti-proton beamsTotal data volume increase w. integrated luminosity

Good-run raw dataFeb 2002 - Dec 2004 1017 M events = 201 k files = 185 TByteDec 2004 - Sep 2005 756 M events = 102 k files = 95 TByte

Page 5: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 5

Data flowData flow

CDF DAQProduction farm

Enstore

raw raw datasetsdatasetsraw raw datasetsdatasets

CDFAnalysisFarm

remote CAFs

User desk top

dCache

ProductionProductiondatasetsdatasetsProductionProductiondatasetsdatasets

Page 6: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 6

Data flow, Enstore storageData flow, Enstore storage

Level-3 farm

Level-1,2 Trigger, DAQ

sub-detector

Dat

aBas

e C

alib

ratio

n8 raw-datasets

52 production datasets

Run

sp

litte

r

File

ca

talo

g

Data logging is in divided by Trigger table 6 physics, 2 monitoring streams

Split events by Trigger table 52 production datasets

Enstore tape library storage 18 STK 9940B drives 200 GB/tape 30 MByte/s read/write Steady R/W rate ~1TByte/drive/day

Page 7: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 7

Computing facilityComputing facility

dCachefile-servers

10Gbit 2Gbit

Remote sites

Analysis farm

Production farm

Enstoretape library File-servers Servers

starlight

CDF Online DAQ

2Gbit

Oracle DB

offline users

Page 8: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 8

Data processing tasksData processing tasks

Raw data event reconstruction• apply detector calibration • calculate detected physics contents • output to assigned trigger datasets

One input file one binary job split output files

Concatenation of output files Raw data file is 1 GByte, Output file size varies 5 MByte to 1 GByteConcatenate small files of the same datasets

in data taking sequence to 1 GByte files

Page 9: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 9

Production farm, 1Production farm, 1stst model model

dfarm

network MySQL,DB

run-

split

ter

calib

ratio

n

Reg

iste

rco

ncat

enat

ed

55

11

22

33

worker

stager

concatenator

44

66

Reg

iste

rou

tput

Reg

iste

rin

put

Direct I/O to Enstore tape library• Custom I/O node to Enstore

FBS batch system• dfarm collection of all worker IDE buffer of input and output files

Farm Processing system• MySQL for bookkeeping• Concatenation with rigid run sequence output truncated to 1 GB files

Performance•Peak rate at 1 TB input/day used to process data up to Dec 2004

Page 10: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 10

Upgrade to CAF & SAM Data HandlingUpgrade to CAF & SAM Data Handling

Condor batch system dressed for CDF CAF (CDF Analysis Farm) package

interface for job submission and monitoring uniform platform to other CDF computing facilities compatible to distributed computing development

Data handling system SAM (Sequential Access via Metadata)

database application for file metadataprovide file locationsload files from tapes to caches

dCache (joint project of DESY+FNAL)virtualizes disk usage, loading files from tapesfiles appear to user as always on disk

Page 11: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 11

Production farm, upgrade Production farm, upgrade

output

merged4 4

fileserver

network SAM,DB

inpu

t-U

RL

run-

split

ter

calib

ratio

n

decl

are

met

adat

a

worker

22

11

33

55

dCache

Upgrade to distributed computing infrastructure:SAM data handing & Condor CAF

A CAF submit, parallel operations of - SAM Project

- Activating data handling to deliver files of the assigned SAM dataset- Tracking file consumption status

- Condor batch JOB- Consuming files of the associated SAM project- declare SAM metadata for bookkeeping

Concatenation of outputMerge output files sorted in run sequence

Store to Enstore via SAM Declare metadata and parentagefor bookeeping

Page 12: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 12

Production challengeProduction challenge

Operation tasks : “cron jobs” Resource monitoring Submission and monitoring

SAM projects Binary jobs on CAF farms

Concatenation and store

Service interface and monitoring Enstore tape I/O SAM Data handling, DB service CDF online, calibration DB, software

Timely process every event collectedInterface to Data-handing, DataBase, multiple CAFSPrecision bookkeeping on millions of files zero tolerance to error, every event is counted

Page 13: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 13

Resource MonitoringResource Monitoring

CDF DB, SAM DB, Data-HandlingCAF condor batch systemFileserver storage

Prohibited cron jobs missing required services

Page 14: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 14

CAF condor monitoringCAF condor monitoring

Tarball (archived execution binary file) distributed to worker CPUsInput files copied via SAM from dCacheEnd of job, output files are copied to assigned fileserver

CPU engagement is monitored

Page 15: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 15

Farm monitoringFarm monitoring

Worker CPUs (Ganglia)& input (rcp) waiting

Traffic to fileserver (xfs)

Bandwidth limit :Input: Enstore loading to dCacheOutput: multiple workers to fileservers 1Gbit network port to IDE: 40 MB/s1output dataset to Enstore: 30 MB/s

Page 16: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 16

SAM project monitorSAM project monitor

Input is delivered by SAM Data-Handling system Input files are organized in data-sets Each data-set is submitted to a SAM project Each project is associated with a CAF condor job

SAM projects monitoredSAM projects monitored

Page 17: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 17

Monitoring a SAM projectMonitoring a SAM project

Consumption of a data-set is monitored File delivery by SAM from registered locations (dcache, samcache, Enstore etc) Consumption by CAF worker is monitored

Page 18: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 18

Bookkeeping via SAM metadataBookkeeping via SAM metadataEach output file has a bookkeeping metadataTagging on parent-daughter after completion

Automatic recovery : on datasets having incomplete daughters

Page 19: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 19

Production stabilityProduction stabilityCAF

condor is very reliable worker hardware failure occasional RAID down-graded occasional

Service 24x7 Oracle, Enstore service SAM, dCache shift support

CPU usage total 6, output to 6 Fileserver Rougher CPU usage at the end

as streams were finishing up

CAF+Farm max=540 jobs

Farm CPU

Traffic to/from Production farm

GREEN In bits/sec BLUE Out bits/sec DARK Peak In bits/sec PINK Peak Out bits/sec

Page 20: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 20

Production rateProduction ratePeak performance:

Jobs distributed to two CAFs (Analysis & Production farm) use 540 CPU to match with 6 I/O streams 8 dCache input file servers, 6 output fileservers

uniform processing speed at 25 M events/day

3 TB input, 4 TB output /day

Integrated Output event logging

Daily file consumption

Page 21: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 21

Scaling capacityScaling capacity

At peak performance of 3 TB input, 4 TB output /day farm switch (2Gbit capacity) sees entire traffic

average load is 800 Mbit/s saturated by simultaneous network to one fileserver Gbit link (40 MB/s)

(corresponds to 100 jobs per data stream for CDF)

Scaling on CPU Add more CPU to a CAF Distribute jobs to multiple CAFs

Scaling on network I/O Limited by the 6 data-stream algorithm, split further Scale by fileservers (more Gbit links) Scale by tape drives

Page 22: CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team

CDF data production models 22

Summary Summary

CDF production farm upgrade has reached a reliable rate of 3 TByte/day Capacity is scalable by increasing CPU and I/O ports

Easy and reliable operation tolerant to error recovery, with zero data loss