HENP DATA GRIDS and STARTAP HENP DATA GRIDS and STARTAP Worldwide Analysis at Regional Centers Harvey B. Newman (Caltech) HPIIS Review San Diego, October.
<ul><li>Slide 1</li></ul><p>HENP DATA GRIDS and STARTAP HENP DATA GRIDS and STARTAP Worldwide Analysis at Regional Centers Harvey B. Newman (Caltech) HPIIS Review San Diego, October 25, 2000 http://l3www.cern.ch/~newman/hpiis2000.ppt Slide 2 Next Generation Experiments: Physics and Technical Goals The extraction of small or subtle new discovery signals from large and potentially overwhelming backgrounds; or precision analysis of large samples The extraction of small or subtle new discovery signals from large and potentially overwhelming backgrounds; or precision analysis of large samples Providing rapid access to event samples and subsets from massive data stores, from ~300 Terabytes in 2001 Petabytes by ~2003, ~10 Petabytes by 2006, to ~100 Petabytes by ~2010. Providing rapid access to event samples and subsets from massive data stores, from ~300 Terabytes in 2001 Petabytes by ~2003, ~10 Petabytes by 2006, to ~100 Petabytes by ~2010. Providing analyzed results with rapid turnaround, by coordinating and managing the LIMITED computing, data handling and network resources effectively Providing analyzed results with rapid turnaround, by coordinating and managing the LIMITED computing, data handling and network resources effectively Enabling rapid access to the data and the collaboration, across an ensemble of networks of varying capability, using heterogeneous resources. Enabling rapid access to the data and the collaboration, across an ensemble of networks of varying capability, using heterogeneous resources. Slide 3 The Large Hadron Collider (2005-) A next-generation particle collider A next-generation particle collider the largest superconductor installation in the world A bunch-bunch collision will take place every 25 nanoseconds: each generating ~20 interactions A bunch-bunch collision will take place every 25 nanoseconds: each generating ~20 interactions But only one in a trillion may lead to a major physics discovery Real-time data filtering: Petabytes per second to Gigabytes per second Real-time data filtering: Petabytes per second to Gigabytes per second Accumulated data of many Petabytes/Year Accumulated data of many Petabytes/Year Large data samples explored and analyzed by thousands of geographically dispersed scientists, in hundreds of teams Large data samples explored and analyzed by thousands of geographically dispersed scientists, in hundreds of teams Slide 4 Computing Challenges: LHC Example Geographical dispersion: of people and resources Complexity: the detector and the LHC environment Scale: Tens of Petabytes per year of data 1800 Physicists 150 Institutes 34 Countries Major challenges associated with: Communication and collaboration at a distance Network-distributed computing and data resources Remote software development and physics analysis R&D: New Forms of Distributed Systems: Data Grids Slide 5 Four LHC Experiments: The Petabyte to Exabyte Challenge ATLAS, CMS, ALICE, LHCB Higgs + New particles; Quark-Gluon Plasma; CP Violation Data written to tape ~25 Petabytes/Year and UP; 0.25 Petaflops and UP Data written to tape ~25 Petabytes/Year and UP; 0.25 Petaflops and UP 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (~2010) (~2015 ?) Total for the LHC Experiments 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (~2010) (~2015 ?) Total for the LHC Experiments Slide 6 From Physics to Raw Data (LEP) Basic physics Fragmentation,Decay Interaction with detector material Multiplescattering,interactionsDetectorresponse Noise, pile-up, cross-talk,inefficiency,ambiguity,resolution,responsefunction,alignment,temperature 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 Raw data (Bytes)Read-outaddresses, ADC, TDC values, Bit patterns e+e+e+e+ e-e-e-e- f f Z0Z0Z0Z0 _ Slide 7 The Compact Muon Solenoid (CMS) MUON BARREL CALORIMETERS Silicon Microstrips (230 sqm) Pixels (80M channels) Scintillating PbWO 4 Cathode Strip Chambers CSC Resistive Plate Chambers RPC Drift Tube Chambers DT Resistive Plate Chambers RPC SUPERCONDUCTING COIL IRON YOKE TRACKERs MUON ENDCAPS Total weight : 12,500 t Overall diameter : 15 m Overall length : 21.6 m Magnetic field : 4 Tesla HCAL Plastic scintillator copper sandwich ECALCrystals Slide 8 From Raw Data to Physics (LEP) e+e+e+e+ e-e-e-e- f f Z0Z0Z0Z0 Basic physics ResultsFragmentation,DecayPhysicsanalysis Interaction with detector material Pattern,recognition,Particleidentification DetectorresponseApplycalibration,alignment 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 Raw data Convert to physicsquantities Reconstruction Simulation (Monte-Carlo) Analysis _ Slide 9 Switch Data Fragments from on-detector digitizers Computer Farm raw data summary data Input: 1-100 GB/s Over 1 PetaByte/year 1-200 TB/year High Speed Network * figures are for one experiment Recording: 100-1000 MB/s Recording: 100-1000 MB/s Filtering: 35K SI95 Tape & Disk Servers Real-time Filtering and Data Acquisition* Slide 10 Higgs Search LEPC September 2000 Slide 11 10 9 events/sec, selectivity: 1 in 10 13 (1 person in a thousand world populations) LHC: Higgs Decay into 4 muons (tracker only); 1000X LEP Data Rate Slide 12 On-line Filter System u Large variety of triggers and thresholds: select physics la carte u Multi-level trigger u Filter out less interesting events u Online reduction 10 7 u Keep highly selected events u Result: Petabytes of Binary Compact Data Per Year Level 1 - Special Hardware Level 2 - Processors 40 MHz (1000 TB/sec) equivalent) Level 3 Farm of Commodity CPUs 75 KHz (75 GB/sec)fully digitised 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) Data Recording & Offline Analysis Slide 13 LHC Vision: Data Grid Hierarchy Tier 1 Tier2 Center Online System Offline Farm, CERN Computer Ctr > 20 TIPS FranceCentre FNAL Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.5 Gbits/sec 100 - 1000 Mbits/sec Physicists work on analysis channels Each institute has ~10 physicists working on one or more channels Physics data cache ~PByte/sec ~0.6-2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment Slide 14 Why Worldwide Computing? Regional Center Concept Advantages Managed, fair-shared access for Physicists everywhere Managed, fair-shared access for Physicists everywhere Maximize total funding resources while meeting the total computing and data handling needs Maximize total funding resources while meeting the total computing and data handling needs Balance between proximity of datasets to appropriate resources, and to the users Balance between proximity of datasets to appropriate resources, and to the users Tier-N Model Efficient use of network: higher throughput Efficient use of network: higher throughput Per Flow: Local > regional > national > international Utilizing all intellectual resources, in several time zones Utilizing all intellectual resources, in several time zones CERN, national labs, universities, remote sites Involving physicists and students at their home institutions Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region And/or by Common Interests (physics topics, subdetectors,) Manage the Systems Complexity Manage the Systems Complexity Partitioning facility tasks, to manage and focus resources Slide 15 Grid Services Architecture [*] GridFabric GridServices ApplnToolkits Applns Data stores, networks, computers, display devices, ; associated local services Protocols, authentication, policy, resource management, instrumentation, discovery,etc.... RemoteviztoolkitRemotecomp.toolkitRemotedatatoolkitRemotesensorstoolkitRemotecollab.toolkit A Rich Set of HEP Data-Analysis Related Applications [*] Adapted from Ian Foster Slide 16 SDSS Data Grid (In GriPhyN): A Shared Vision Three main functions: Raw data processing on a Grid (FNAL) Raw data processing on a Grid (FNAL) Rapid turnaround with TBs of data Accessible storage of all image data Fast science analysis environment (JHU) Fast science analysis environment (JHU) Combined data access + analysis of calibrated data Distributed I/O layer and processing layer; shared by whole collaboration Public data access Public data access SDSS data browsing for astronomers, and students Complex query engine for the public Slide 17 Principal areas of GriPhyN applicability: Main data processing (Caltech/CACR) Enable computationally limited searches periodic sources Access to LIGO deep archive Access to Observatories Science analysis environment for LSC (LIGO Scientific Collaboration) Tier2 centers: shared LSC resource Exploratory algorithm, astrophysics research with LIGO reduced data sets Distributed I/O layer and processing layer builds on existing APIs Data mining of LIGO (event) metadatabases LIGO data browsing for LSC members, outreach Hanford Livingston Caltech MIT INet2 Abilene Tier1 LSC Tier2 OC3 OC48 OC3 OC12 OC48 LIGO Data Grid Vision Slide 18 5 5 250 0.8 8 8 24 * 960 * 6 * 1.5 12 LAN-WAN Routers Computer farm at CERN (2005) Computer farm at CERN (2005) 0.8 Storage Network Farm Network 0.5 M SPECint95 > 5K processors 0.6 PByte disk > 5K disks + 2X More Outside 0.5 M SPECint95 > 5K processors 0.6 PByte disk > 5K disks + 2X More Outside * Data Rate in Gbps Thousands of CPU boxes Thousands of disks Hundreds of tape drives Real-time detector data Slide 19 Tier1 Regional Center Architecture (I. Gaines, FNAL) Tapes Network from CERN Network from Tier 2 centers Tape Mass Storage & Disk Servers Database Servers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD AOD AOD DPD Scheduled Physics groups Individual Analysis AOD DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Support Services Slide 20 RD45, GIODNetworked Object Databases RD45, GIODNetworked Object Databases Clipper/GC High speed access to Objects or File data FNAL/SAM for processing and analysis Clipper/GC High speed access to Objects or File data FNAL/SAM for processing and analysis SLAC/OOFS Distributed File System + Objectivity Interface SLAC/OOFS Distributed File System + Objectivity Interface NILE, Condor:Fault Tolerant Distributed Computing NILE, Condor:Fault Tolerant Distributed Computing MONARCLHC Computing Models: Architecture, Simulation, Strategy, Politics MONARCLHC Computing Models: Architecture, Simulation, Strategy, Politics ALDAPOO Database Structures & Access Methods for Astrophysics and HENP Data ALDAPOO Database Structures & Access Methods for Astrophysics and HENP Data PPDGFirst Distributed Data Services and Data Grid System Prototype PPDGFirst Distributed Data Services and Data Grid System Prototype GriPhyN Production-Scale Data Grids GriPhyN Production-Scale Data Grids EU Data Grid EU Data Grid Roles of Projects for HENP Distributed Analysis Slide 21 CMS Analysis and Persistent Object Store Online Common Filters & Pre-Emptive Object Creation On Demand Object Creation CMS Slow Control Detector Monitoring L4 L2/L3 L1 Persistent Object Store Filtering Simulation Calibrations, Group Analyses User Analysis Data Organized In a(n Object) Hierarchy u Raw, Reconstructed (ESD), Analysis Objects (AOD), Tags Data Distribution u All raw, reconstructed and master parameter DBs at CERN u All event tag and AODs at all regional centers u HOT data moved automatically to RCs Slide 22 GIOD: Globally Interconnected Object Databases Hit Track Detector MultiTB OO Database Federation; used across LANs and WANs 170 MByte/sec CMS Milestone Developed Java 3D OO Reconstruction, Analysis and Visualization Prototypes that Work Seamlessly Over Worldwide Networks Deployed facilities and database federations as testbeds for Computing Model studies Slide 23 The Particle Physics Data Grid (PPDG) u First Round Goal: Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabyte PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot Site to Site Data Replication Service 100 Mbytes/sec ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS Multi-Site Cached File Access Service University CPU, Disk, Users PRIMARY SITE DAQ, Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University Users University Users University Users Satellite Site Tape, CPU, Disk, Robot u Matchmaking, Co-Scheduling: SRB, Condor, Globus services; HRM, NWS Slide 24 PPDG WG1: Request Manager tape system HRM Replica catalog Network Weather Service Physical file transfer requests GRID Request Interpreter Disk Cache Event-file Index DRM Disk Cache Request Executor Logical Set of Files Request Planner (Matchmaking) DRM Disk Cache CLIENT Logical Request REQUEST MANAGER Slide 25 LLNL Earth Grid System Prototype Inter-communication Diagram Disk Client Request Manager ISI GSI- wuftpd Disk SDSC GSI- pftpd HPSSHPSS LBNL GSI- wuftpd Disk ANL GSI- wuftpd Disk NCAR GSI- wuftpd Disk LBNL Disk on Clipper HPSSHPSS HRM ANL Replica Catalog GIS with NWS GSI-ncftp LDAP Script LDAP C API or Script GSI-ncftp CORBA Slide 26 Grid Data Management Prototype (GDMP) Distributed Job Execution and Data Handling: Transparency Performance Security Fault Tolerance Automation Submit job Replicate data Replicate data Site A Site B Site C r Jobs are executed locally or remotely r Data is always written locally r Data is replicated to remote sites Job writes data locally GDMP V1.1: Caltech + EU DataGrid WP2 Tests by CALTECH, CERN, FNAL, Pisa for CMS HLT Production 10/2000; Integration with ENSTORE, HPSS, Castor Slide 27 GriPhyN: Grid Physics Network A New Form of Integrated Distributed System A New Form of Integrated Distributed System Meeting the Scientific Goals of LIGO, SDSS and the LHC Experiments Meeting the Scientific Goals of LIGO, SDSS and the LHC Experiments u Focus on Tier2 Centers at Universities In a Unified Hierarchical Grid of Five Levels u 18 Centers; with Four Sub-Implementations 5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS Near Term Focus on LIGO, SDSS handling of real data; LHC Data Challenges with simulated data u Cooperation with PPDG, MONARC and EU DataGrid http://www.phys.ufl.edu/~avery/GriPhyN/ http://www.phys.ufl.edu/~avery/GriPhyN/ Data Intensive Science Slide 28 GriPhyN: PetaScale Virtual Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, computers, and network ) Resource Management Services Resource Management Services Security and Policy Services Security and Policy Ser...</p>