35
Modeling Regional Centers with MONARC Simulation Tools Modeling LHC Regional Centers Modeling LHC Regional Centers with the MONARC Simulation Tools with the MONARC Simulation Tools Irwin Gaines, FNAL for the MONARC collaboration

Modeling Regional Centers with MONARC Simulation Tools Modeling LHC Regional Centers with the MONARC Simulation Tools Irwin Gaines, FNAL for the MONARC

Embed Size (px)

Citation preview

Modeling Regional Centers with MONARC Simulation Tools

Modeling LHC Regional Centers with Modeling LHC Regional Centers with

the MONARC Simulation Toolsthe MONARC Simulation Tools

Irwin Gaines, FNAL

for the MONARC collaboration

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

MONARCMONARCMONARCMONARC A joint project (LHC experiments and CERN/IT) to understand issues associated with distributed data access and analysis for the LHC

Examine distributed data plans of current and near future experiments

Determine characteristics and requirements for LHC regional centers

Understand details of analysis process and data access needs for LHC data

Measure critical parameters characterizing distributed architectures, especially database and network issues

Create modeling and simulation tools Simulate a variety of models to understand constraints on architectures

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

MONARC MONARC MONARC MONARC

MModels odels OOf f NNetworked etworked AAnalysis nalysis At At RRegional egional CCentersenters

Caltech, CERN, FNAL, Heidelberg, INFN, Caltech, CERN, FNAL, Heidelberg, INFN, Helsinki, KEK, Lyon, Marseilles, Munich, Helsinki, KEK, Lyon, Marseilles, Munich,

Orsay, Oxford, RAL,Tufts, ...Orsay, Oxford, RAL,Tufts, ...GOALSGOALS

Specify the main parameters characterizing Specify the main parameters characterizing the Model’s performance: throughputs, the Model’s performance: throughputs, latencieslatencies

Determine classes of Computing Models Determine classes of Computing Models feasible for LHC (matched to network feasible for LHC (matched to network capacity and data handling resources)capacity and data handling resources)

Develop “Baseline Models” in the “feasible” Develop “Baseline Models” in the “feasible” categorycategory

Verify resource requirement baselines: Verify resource requirement baselines: (computing, data handling, networks)(computing, data handling, networks)

COROLLARIES:COROLLARIES: Define the Define the Analysis ProcessAnalysis Process Define Define Regional Center ArchitecturesRegional Center Architectures Provide Provide Guidelines for the final ModelsGuidelines for the final Models

622

Mbi

ts/s 622 M

bits/s

Desktops

CERNn.107 MIPS

m Pbyte Robot

Universityn.106MIPS

m Tbyte Robot

FNAL4.107 MIPS110 Tbyte

Robot

622

Mbi

ts/s

N x

622

M

bit

s/s

622Mbits/s

622 Mbits/s

Desktops

Desktops

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

MONARC: a systematic study of LHC MONARC: a systematic study of LHC regional center issuesregional center issues

MONARC: a systematic study of LHC MONARC: a systematic study of LHC regional center issuesregional center issues

This talk will discuss

- study of existing and near future experiment analysis architectures

(http://home.fnal.gov/~odell/future/future_frame.html)

- description of regional center services(http://home.fnal.gov/~butler/rcarchitecture.htm)

- understanding of LHC analysis process

- use of tools to draw conclusions about suitability of different analysis architectures

(testbed measurements, development and verification of modeling tools covered in other talks at this conference)

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

General Need for distributed data General Need for distributed data access and analysis:access and analysis:

General Need for distributed data General Need for distributed data access and analysis:access and analysis:

Potential problems of a single centralized computing center include:

- scale of LHC experiments: difficulty of accumulating and managing all resources at one location

- geographic spread of LHC experiments: providing equivalent location independent access to data for physicists

- help desk, support and consulting in same time zone

- cost of LHC experiments: optimizing use of resources located world wide

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Motivations for Regional CentersMotivations for Regional Centers Motivations for Regional CentersMotivations for Regional Centers

A distributed computing architecture based on regional centers offers:

A way of utilizing the expertise and resources residing in computing centers all over the world

Provide local consulting and support To maximize the intellectual contribution of physicists all

over the world without requiring their physical presence at CERN

Acknowledgement of possible limitations of network bandwidth

Allows people to make choices on how they analyze data based on availability or proximity of various resources such as CPU, data, or network bandwidth.

Modeling Regional Centers with MONARC Simulation Tools

Current and Future Experiment Current and Future Experiment

SurveysSurveys

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Future Experiment SurveyFuture Experiment SurveyFuture Experiment SurveyFuture Experiment Survey

Analysis/Results From the previous survey, we saw many sites contributed to

Monte Carlo generation This is now the norm

New experiments trying to use the Regional Center concept BaBar has Regional Centers at IN2P3 and RAL STAR has Regional Center at LBL/NERSC CDF and D0 offsite institutions paying more attention as run gets

closer.

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Future Experiment SurveyFuture Experiment SurveyFuture Experiment SurveyFuture Experiment Survey

Other observations/ requirements In the last survey, we pointed out the following requirements for RC’s:

24X7 support software development team diverse body of users good, clear documentation of all s/w and s/w tools

The following are requirements for the central site (I.e. CERN) Central code repository easy to use and easily accessible for remote sites be “sensitive” to remote sites in database handling, raw data handling and

machine flavors provide good, clear documentation of all s/w and s/w tools

The experiments in this survey achieving the most in distributed computing are following these guidelines

Modeling Regional Centers with MONARC Simulation Tools

Regional Center CharacteristicsRegional Center Characteristics

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Regional CentersRegional CentersRegional CentersRegional Centers

Regional Centers will Provide all technical services and data services required to do the

analysis Maintain all (or a large fraction of) the processed analysis data.

Possibly may only have large subsets based on physics channels. Maintain a fixed fraction of fully reconstructed and raw data

Cache or mirror the calibration constants Maintain excellent network connectivity to CERN and excellent

connectivity to users in the region. Data transfer over the network is preferred for all transactions but transfer of very large datasets on removable data volumes is not ruled out.

Share/develop common maintenance, validation, and production software with CERN and the collaboration

Provide services to physicists in the region, contribute a fair share to post-reconstruction processing and data analysis, collaborate with other RCs and CERN on common projects, and provide services to members of other regions on a best effort basis to further the science of the experiment

Provide support services, training, documentation, trouble shooting to RC and remote users in the region

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

DataImport

DataExport

Mass Storage & DiskServers

Database Servers

Tapes

Network from CERN

Networkfrom Tier 2 andsimulation centers

PhysicsSoftware

Development

R&D Systemsand Testbeds

Info serversCode servers

Web ServersTelepresence

Servers

TrainingConsultingHelp Desk

ProductionReconstruction

Raw/Sim-->ESD

Scheduled, predictable

experiment/physics groups

ProductionAnalysis

ESD-->AODAOD-->DPD

Scheduled

Physics groups

Individual Analysis

AOD-->DPDand plots

Chaotic

Physicists Desktops

Tier 2

Local institutes

CERN

Tapes

Support Services

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

DataImport

DataExport

Mass Storage & DiskServers

Database Servers

Tapes

Network from CERN

Networkfrom Tier 2 andsimulation centers

PhysicsSoftware

Development

R&D Systemsand Testbeds

Info serversCode servers

Web ServersTelepresence

Servers

TrainingConsultingHelp Desk

ProductionReconstruction

Raw/Sim-->ESD

Scheduled, predictable

experiment/physics groups

ProductionAnalysis

ESD-->AODAOD-->DPD

Scheduled

Physics groups

Individual Analysis

AOD-->DPDand plots

Chaotic

Physicists Desktops

Tier 2

Local institutes

CERN

TapesData Input Rate from CERN: Raw Data - 5% 50TB/yr ESD Data - 50% 50TB/yr AOD Data - All 10TB/yr Revised ESD - 20TB/yr

Data Input from Tier 2: Revised ESD and AOD - 10TB/yr

Data Input from Simulation Centers: Raw Data - 100TB/yr

Data Output Rate to CERN: AOD Data - 8 TB/yr Recalculated ESD - 10 TB/yr Simulation ESD data - 10 TB/yr

Data Output to Tier 2: Revised ESD and AOD - 15 TB/yr

Data Output to local institutes: ESD, AOD, DPD data - 20TB/yr

Total Storage: Robotic Mass Storage - 300TB Raw Data: 50TB 5*10**7 events (5% of 1 year) Raw (Simulated) Data: 100TB 10**8 events EDS (Reconstructed Data): 100TB - 10**9 events (50% of 2 years) AOD (Physics Object) Data: 20TB 2*10**9 events (100% of 2 years) Tag Data: 2TB (all) Calibration/Conditions data base: 10TB (only latest version of most data types kept here)Central Disk Cache - 100TB (per user demand)

CPU Required for AMS database servers: ??*10**3 SI95 power

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

DataImport

DataExport

Mass Storage & DiskServers

Database Servers

Tapes

Network from CERN

Network from Tier 2 andsimulation centers

ProductionReconstruction

Raw/Sim-->ESD

Scheduled

experiment/physics groups

ProductionAnalysis

ESD-->AODAOD-->DPD

Scheduled

Physics groups

Individual Analysis

AOD-->DPDand plots

Chaotic

Physicists

Desktops

Tier 2

Local institutes

CERN

Tapes

Farms of low cost commodity computers, limited I/O rate, modest local disk cache-----------------------------------------------------Reconstruction Jobs: Reprocessing of raw data: 10**8 events/year (10%) Initial processing of simulated data: 10**8/year

1000 SI95-sec/event ==> 10**4 SI95 capacity: 100 processing nodes of 100 SI95 power

Event Selection Jobs: 10 physics groups * 10**8 events (10%samples) * 3 times/yr based on ESD and latest AOD data 50 SI95/evt ==> 5000 SI95 power

Physics Object creation Jobs: 10 physics groups * 10**7 events (1% samples) * 8 times/yr based on selected event sample ESD data 200 SI95/event ==> 5000 SI95 power

Derived Physics data creation Jobs: 10 physics groups * 10**7 events * 20 times/yr based on selected AOD samples, generates “canonical” derived physics data 50 SI95/evt ==> 3000 SI95 power

Total 110 nodes of 100 SI95 power

Derived Physics data creation Jobs: 200 physicists * 10**7 events * 20 times/yr based on selected AOD and DPD samples 20 SI95/evt ==> 30,000 SI95 power

Total 300 nodes of 100 SI95 power

Modeling Regional Centers with MONARC Simulation Tools

Understanding the LHC Analysis Understanding the LHC Analysis

ProcessProcess

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

MONARC Analysis Process ExampleMONARC Analysis Process ExampleMONARC Analysis Process ExampleMONARC Analysis Process Example

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Model and Simulation parametersModel and Simulation parametersModel and Simulation parametersModel and Simulation parameters

Have a new set of parameters common to all simulating groups.

More realistic values, but still to be discussed/agreed on the basis of Experiment’s information.

1000 Proc_time_RAW SI95sec/event (350)25 Proc_Time_ESD “ (2.5)5 Proc_Time_AOD “ (0.5)3 Analyze_Time_TAG “3 Analyze_Time_AOD “15 Analyze_Time_ESD “ (3)600 Analyze_Time_RAW “ (350)100 Memory of Jobs MB5000 Proc_Time_Create_RAW SI95sec/event (35)1000 Proc_Time_Create_ESD “ (1)25 Proc_Time_Create_AOD “ (1)

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Example : Physics Analysis at Example : Physics Analysis at Regional CentresRegional Centres

Example : Physics Analysis at Example : Physics Analysis at Regional CentresRegional Centres

Similar data processing Similar data processing jobs are performed in jobs are performed in several RCsseveral RCs

Each Centre has “TAG” Each Centre has “TAG” and “AOD” databases and “AOD” databases replicated.replicated.

Main Centre provides Main Centre provides “ESD” and “RAW” data “ESD” and “RAW” data

Each job processes Each job processes AOD data, and also a AOD data, and also a a fraction of ESD and a fraction of ESD and RAW.RAW.

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Example: Physics AnalysisExample: Physics AnalysisExample: Physics AnalysisExample: Physics Analysis

Modeling Regional Centers with MONARC Simulation Tools

Results of Models of Distributed Results of Models of Distributed ArchitecturesArchitectures

Modeling Regional Centers with MONARC Simulation Tools

Analysis and Reconstruction Analysis and Reconstruction

SimulationsSimulations

Preliminary Results for simple Models

Try to stress the System and look for a steady state(same Jobs repeated every day)

P. Capiluppi, L. Perini, S. Resconi, D. Ugolotti

Dept. of Physics & INFN - Bologna & Milano

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Base Model usedBase Model usedBase Model usedBase Model used

Basic Jobs Reconstruction of 107 events : RAW--> ESD --> AOD --> TAG at CERN

It’s the production while the data are coming from the DAQ (100 days of running collecting a billion of events per year)

Analysis of 5 Working Groups each of 25 analyzers on TAG only (no request to higher level data samples). Every analyzer submit 4 sequential jobs on 106 events.Each analyzer work start-time is a flat random choice in the range of 3000 seconds.Each analyzer data sample of 106 events is a random choice in the complete data sample of TAG DataBase consisting of 107 events.

Transfer (FTP) of a 107 events ESD, AOD and TAG from CERN to RC

–CERN Activities : Reconstruction, 5 WG Analysis, FTP transferCERN Activities : Reconstruction, 5 WG Analysis, FTP transfer–RC Activities : 5 (uncorrelated) WG Analysis, receive FTP RC Activities : 5 (uncorrelated) WG Analysis, receive FTP transfertransfer

Job’s “paper estimate”: –Single Analysis Job : 1.67 CPU hours at CERN = 6000 sec at CERN (same at RC)–Reconstruction at CERN for 1/500 RAW to ESD : 3.89 CPU hours = 14000 sec–Reconstruction at CERN for 1/500 ESD to AOD : 0.03 CPU hours = 100 sec

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Resources: LAN speeds ?!Resources: LAN speeds ?!Resources: LAN speeds ?!Resources: LAN speeds ?!

In our Models the DB Servers are uncorrelated and thus one activity uses a single Server. The bottlenecks are the “read” and “write” speed to and from the Server. In order to use the CPU power at reasonable percentage we need a read speed of at least 300 MB/s and a write speed of 100 MB/s (milestone already met today)

We use 100 MB/s in current simulations (10 Gbits/sec switched LANs in 2005 may be possible).

Processing node link speed is negligible in our simulations.

Of course the “real” implementation of the Farms can be different, but the results of the simulation do not depend on “real” implementation: they are based on usable resources.

See following slides

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Data access speedsData access speedsData access speedsData access speeds

DB read speed 25 MB/sDB write speed 15 MB/sDB link speed 100 MB/sNode link speed 10 MB/s

Reconstruction of ESD, AOD and TAG (107 events) at CERN, repeated for 10 days.

–Poor CPU use (less than 5%)–Low jobs efficiency–Jobs span over the following days

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Data access speedsData access speeds

Reconstruction of ESD, AOD and TAG (107 events) at CERN, repeated for 10 days.

DB read speed 100 MB/sDB write speed 100 MB/sDB link speed 100 MB/sNode link speed 100 MB/s

–Better CPU use (about 15%)–Still low jobs efficiency–Jobs span over the following days

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

More realistic values for CERN and RCMore realistic values for CERN and RCMore realistic values for CERN and RCMore realistic values for CERN and RC

Data Link speeds at 100 MB/sec100 MB/sec (all values) except : Node_Link_Speed at 10 MB/sec WAN Link speeds at 40 MB/sec

CERN 1000 Processing nodes each of 500 SI95

RC 200 Processing nodes each of 500 SI95

1000 Processing nodestimes 500SI95 = 500kSI95 about the CPU power of CERN Tier0

disk space as for the number of DBs

100kSI95 processing Power = 20% CERN

disk space as for the number of DBs

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Analysis on 10Analysis on 1077 events eventsAnalysis on 10Analysis on 1077 events eventsReconstruction of ESD, AOD and TAG (107 events) at CERN

5 WG Analysis at CERN5 WG Analysis at RC

Transfer (FTP) of 107 events ESD and AOD to the RC

2 days of simulated activities

Test7_Model1107 events per job!

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Analysis on 10Analysis on 1077 events eventsReconstruction of ESD, AOD and TAG (107 events) at CERN

5 WG Analysis at CERN5 WG Analysis at RC

Transfer (FTP) of 107 events ESD and AOD to the RC

Test7_Model1107 events per job!

2 days of simulated activities

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Analysis on 10Analysis on 1077 events events

Reconstruction of ESD, AOD and TAG (107 events) at CERN5 WG Analysis at CERN

5 WG Analysis at RCTransfer (FTP) of 107 events ESD and AOD to the RC

Test7bis_Model1107 events per job!

2 days of simulated activities

RC with RC with

doubled CPU doubled CPU

resourcesresources

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Some Conclusions of SimulationsSome Conclusions of SimulationsSome Conclusions of SimulationsSome Conclusions of Simulations

Larger CPU power (of the order of 1000 SI95sec) for event reconstruction is possible at CERN. (may eventually interfere with number of re-reprocessing per year).

A concern. A RC is 20% of CERN but the “full Analysis process” load of 5 physics groups, if fully performed at a single RC, requires more than the 20% of CERN resources! We need to better define “full Analysis process”.

Role of Tier2 RC should be coordinated with the corresponding Tier1 RC activities and/or the distribution of WGs over all the Centres should be revisited.

Using 107 events for all the Analysis requires a re-thinking of the Analysis Model. RCs mustmust have place for building Revised data and MonteCarlo data.

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

 

SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS

MONARC Collaboration

 

Alexander Nazarenko and Krzysztof Sliwa 

                      

   

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

 

Physics Group Selection

    Each group reads 100% TAG events and follows:

~10% to AOD                 ~1% to ESD                 ~0.01% to RAW   

Number of Groups Follow AOD Jobs /groupL  Groups (L~10) M  Groups (M~5)N  Groups (N~5)

p% of total TAG (~1%) q% of total TAG (~5%)r% of total TAG (~10%)

1-21-21-2

~20 Jobs/Day in total evenly spread among participating RCs                        

            

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

 

6 RC: Two types of Reconstruction+Analysis+Selection            Five Tier 1 and one Tier 2 Centers optimized to perform the complete set with 30 MBps WAN and optimized

LAN

Participating RC Data Jobs

CERN     

INFN 

KEK 

TUFTS 

CALTECH 

CALTECH-2

RAW, ESD, AOD, TAG    

ESD, AOD, TAG  

 ESD, AOD, TAG  

ESD, AOD, TAG  

ESD, AOD, TAG  

AOD, TAG

4 Physics Group Selectionx10

40 Physics Group Analysis Full Reconstruction and FTP

Monthly Reconstruction and FTP (10days)

4 Physics Group Selectionx10

40 Physics Group Analysis 4 Physics Group Selectionx10

40 Physics Group Analysis 4 Physics Group Selectionx10

40 Physics Group Analysis 4 Physics Group Selectionx10

40 Physics Group Analysis 20 Physics Group Analysis

 

  Model1    (fixed values)                       Model2  (randomized data processing times and sizes)      

Conclusion:  Current configuration provides a possibility to run daily the complete set of jobs at 6 centers with the WAN bandwidth 30 MBps  and the network parameters not exceeding the estimate of 2005

            

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Modeling Regional Centers with MONARC simulation tools

CHEP2000 07 Feb 2000

Overall ConclusionsOverall ConclusionsOverall ConclusionsOverall Conclusions

MONARC simulation tools are: sophisticated enough to allow modeling of complex

distributed analysis scenarios simple enough to be used by non experts

Initial modeling runs are alkready showing interestung results

Future work will help identify bottlenecks and understand constraints on architectures