Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech

Embed Size (px)


Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech CHEP 2000, Padova February 10, 2000. LHC Computing: Different from Previous Experiment Generations. Geographical dispersion: of people and resources - PowerPoint PPT Presentation

Text of Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech

  • Distributed Data Access and Analysisfor Next Generation HENP Experiments

    Harvey Newman, CaltechCHEP 2000, Padova February 10, 2000

  • LHC Computing: Different from Previous Experiment Generations

    Geographical dispersion: of people and resources Complexity: the detector and the LHC environment Scale: Petabytes per year of data

    ~5000 Physicists 250 Institutes ~50 CountriesMajor challenges associated with: Coordinated Use of Distributed Computing Resources Remote software development and physics analysis Communication and collaboration at a distanceR&D: A New Form of Distributed System: Data-Grid

  • Four Experiments The Petabyte to Exabyte ChallengeATLAS, CMS, ALICE, LHCBHiggs and New particles; Quark-Gluon Plasma; CP Violation Data written to tape ~5 Petabytes/Year and UP (1 PB = 1015 Bytes) 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (~2010) (~2020 ?) Total for the LHC Experiments

  • To Solve: the LHC Data ProblemWhile the proposed LHC computing and data handling facilities are large by present-day standards,They will not support FREE access, transport or processing for more than a minute part of the data Balance between proximity to large computational and data handling facilities, and proximity to end users and more local resources for frequently-accessed datasets Strategies must be studied and prototyped, to ensure both: acceptable turnaround times, and efficient resource utilisation Problems to be Explored How to meet demands of hundreds of users who need transparent access to local and remote data, in disk caches and tape stores Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies Ensure that the system is dimensioned/used/managed optimally, for the mixed workload

  • MONARC General Conclusions on LHC Computing Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc.The scale of LHC Computing requires a worldwide effort to accumulate the necessary technical and financial resourcesA distributed hierarchy of computing centres will lead to better use of the financial and manpower resources of CERN, the Collaborations, and the nations involved, than a highly centralized model focused at CERN The distributed model also provides better use of physics opportunities at the LHC by physicists and students At the top of the hierarchy is the CERN Center, with the ability to perform all analysis-related functions, but not the ability to do them completelyAt the next step in the hierarchy is a collection of large, multi-service Tier1 Regional Centres, each with 10-20% of the CERN capacity devoted to one experiment There will be Tier2 or smaller special purpose centers in many regions

  • Bandwidth Requirements Estimate (Mbps) [*]ICFA Network TasSee http://l3www.cern.ch/~newman/icfareq98.htmlCirca 2000: Predictions roughly on track: Universal BW Growth by ~2X Per Year;622 Mbps on Links European and Transatlantic by ~2002-3Terabit/sec US Backbones (e.g. ESNet) by ~2003-5Caveats: Distinguish raw bandwidth and effective line capacity;Maximum end-to-end rate for individual data flows QoS/ IP has a way to go

    D388, D402,D274

  • CMS Analysis and Persistent Object StoreData Organized In a(n Object) HierarchyRaw, Reconstructed (ESD), Analysis Objects (AOD), TagsData DistributionAll raw, reconstructed and master parameter DBs at CERN All event TAG and AODs, and selected reconstructed data sets at each regional center HOT data (frequently accessed) moved to RCs Goal of location and medium transparency

    On Demand Object CreationC121

  • GIOD SummaryHitTrack DetectorGIOD hasConstructed a Terabyte-scale set of fully simulated CMS events and used these to create a large OO databaseLearned how to create large database federationsCompleted the 100 (to 170) Mbyte/sec CMS MilestoneDeveloped prototype reconstruction and analysis codes, and Java 3D OO visualization demonstrators, that work seamlessly with persistent objects over networksDeployed facilities and database federations as useful testbeds for Computing Model studiesC226C51

  • Data Grid Hierarchy (CMS Example)Online SystemOffline Farm ~20 TIPSCERN Computer CenterFermilab ~4 TIPSFrance Regional Center Italy Regional Center Germany Regional Center InstituteInstituteInstituteInstitute ~0.25TIPSWorkstations~100 MBytes/sec~100 MBytes/sec~2.4 Gbits/sec100 - 1000 Mbits/secBunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in sizePhysicists work on analysis channels.Each institute has ~10 physicists working on one or more channelsData for these channels should be cached by the institute serverPhysics data cache~PBytes/sec~622 Mbits/sec or Air Freight~622 Mbits/secTier 0Tier 1Tier 3Tier 41 TIPS = 25,000 SpecInt95PC (today) = 10-15 SpecInt95Tier 2E277

  • LHC (and HEP) Challenges of Petabyte-Scale DataTechnical RequirementsOptimize use of resources with next generation middlewareCo-Locate and Co-Schedule Resources and RequestsEnhance database systems to work seamlessly across networks: caching/replication/mirroringBalance proximity to centralized facilities, and to end users for frequently accessed data Requirements of the Worldwide Collaborative Nature of ExperimentsMake appropriate use of data analysis resources in each world region, conforming to local and regional policies Involve scientists and students in each world region in front-line physics research Through an integrated collaborative environmentE163C74, C292

  • Time-Scale: CMS Recent Events

    A PHASE TRANSITION in our understanding of the role of CMS Software and Computing occurred in October - November 1999Strong Coupling of S&C Task,Trigger/DAQ, Physics TDR, detector performance studies and other main milestones Integrated CMS Software and Trigger/DAQ planning for the next round: May 2000 Milestone Large simulated samples are required: ~ 1 Million events fully simulated a few times during 2000, in ~1 month A smoothly rising curve of computing and data handling needs from now on Mock Data Challenges from 2000 (1% scale) to 2005 Users want substantial parts of the functionality formerly planned for 2005, Starting Now


  • Roles of Projectsfor HENP Distributed Analysis RD45, GIOD:Networked Object Databases Clipper,GC; High speed access to Objects or File data FNAL/SAM for processing and analysis SLAC/OOFS Distributed File System + Objectivity Interface NILE, Condor:Fault Tolerant Distributed Computing with Heterogeneous CPU Resources MONARC:LHC Computing Models: Architecture, Simulation, Strategy, Politics PPDG:First Distributed Data Services and Data Grid System Prototype ALDAP: Database Structures and Access Methods for Astrophysics and HENP Data GriPhyN: Production-Scale Data GridSimulation/Modeling, Application + Network Instrumentation, System Optimization/Evaluation APOGEE


  • MONARC: Common Project Models Of Networked Analysis At Regional CentersCaltech, CERN, Columbia, FNAL, Heidelberg, Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, Tufts PROJECT GOALSDevelop Baseline ModelsSpecify the main parameters characterizing the Models performance: throughputs, latencies Verify resource requirement baselines: (computing, data handling, networks) TECHNICAL GOALS Define the Analysis Process Define RC Architectures and Services Provide Guidelines for the final Models Provide a Simulation Toolset for Further Model studies


  • MONARC Working Groups/Chairs Analysis Process Design P. Capiluppi (Bologna, CMS)

    Architectures Joel Butler (FNAL, CMS)

    Simulation Krzysztof Sliwa (Tufts, ATLAS)

    Testbeds Lamberto Luminari (Rome, ATLAS)

    Steering Laura Perini (Milan, ATLAS) Harvey Newman (Caltech, CMS) & Regional Centres Committee

  • MONARC Architectures WG: Regional Centre Facilities & Services Regional Centres Should Provide All technical and data services required to do physics analysis All Physics Objects, Tags and Calibration data Significant fraction of raw data Caching or mirroring calibration constants Excellent network connectivity to CERN and the regions users Manpower to share in the development of common validation and production software A fair share of post- and re-reconstruction processing Manpower to share in ongoing work on Common R&D Projects Excellent support services for training, documentation, troubleshooting at the Centre or remote sites served by it Service to members of other regionsLong Term Commitment for staffing, hardware evolution and support for R&D, as part of the distributed data analysis architecture

  • MONARC and Regional CentresMONARC RC FORUM: Representative Meetings QuarterlyRegional Centre Planning well-advanced, with optimistic outlook, in US (FNAL for CMS; BNL for ATLAS), France (CCIN2P3), Italy, UK Proposals submitted late 1999 or early 2000Active R&D and prototyping underway, especially in US, Italy, Japan; and UK (LHCb), Russia (MSU, ITEP), Finland (HIP) Discussions in the national communities also underway in Japan, Finland, Russia, GermanyThere is a near-term need to understand the level and sharing of suppo