High Energy Physics: Networks & Grids Systems for Global Science High Energy Physics: Networks & Grids Systems for Global Science Harvey B. Newman Harvey.
<ul><li><p> High Energy Physics: Networks & Grids Systems for Global Science Harvey B. Newman California Institute of Technology AMPATH Workshop, FIU January 31, 2003</p></li><li><p>Next Generation Networks for Experiments: Goals and NeedsProviding rapid access to event samples, subsets and analyzed physics results from massive data storesFrom Petabytes by 2002, ~100 Petabytes by 2007, to ~1 Exabyte by ~2012.Providing analyzed results with rapid turnaround, by coordinating and managing the large but LIMITED computing, data handling and NETWORK resources effectivelyEnabling rapid access to the data and the collaborationAcross an ensemble of networks of varying capabilityAdvanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANsWith reliable, monitored, quantifiable high performanceLarge data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams</p></li><li><p>LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate109 events/sec, selectivity: 1 in 1013 (1 person in 1000 world populations)</p></li><li><p> Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*][*] BW Requirements Increasing Faster Than Moores LawSee http://gate.hep.anl.gov/lprice/TAN</p><p>2001</p><p>2002</p><p>2003</p><p>2004</p><p>2005</p><p>2006</p><p>CMS</p><p>100</p><p>200</p><p>300</p><p>600</p><p>800</p><p>2500</p><p>ATLAS</p><p>50</p><p>100</p><p>300</p><p>600</p><p>800</p><p>2500</p><p>BaBar</p><p>300</p><p>600</p><p>1100</p><p>1600</p><p>2300</p><p>3000</p><p>CDF</p><p>100</p><p>300</p><p>400</p><p>2000</p><p>3000</p><p>6000</p><p>D0</p><p>400</p><p>1600</p><p>2400</p><p>3200</p><p>6400</p><p>8000</p><p>BTeV</p><p>20</p><p>40</p><p>100</p><p>200</p><p>300</p><p>500</p><p>DESY</p><p>100</p><p>180</p><p>210</p><p>240</p><p>270</p><p>300</p><p>CERN</p><p>BW</p><p>155-310</p><p>622</p><p>2500</p><p>5000</p><p>10000</p><p>20000</p></li><li><p>HENP Major Links: Bandwidth Roadmap (Scenario) in GbpsContinuing the Trend: ~1000 Times Bandwidth Growth Per Decade; We are Rapidly Learning to Use and Share Multi-Gbps Networks</p><p>Year</p><p>Production</p><p>Experimental</p><p>Remarks</p><p>2001</p><p>0.155 </p><p>0.622-2.5</p><p>SONET/SDH</p><p>2002</p><p>0.622</p><p>2.5</p><p>SONET/SDHDWDM; GigE Integ.</p><p>2003</p><p>2.5</p><p>10 </p><p>DWDM; 1 + 10 GigEIntegration</p><p>2005</p><p>10</p><p>2-4 X 10 </p><p>( Switch;( Provisioning</p><p>2007</p><p>2-4 X 10</p><p>~10 X 10; 40 Gbps</p><p>1st Gen. ( Grids</p><p>2009</p><p>~10 X 10or 1-2 X 40 </p><p>~5 X 40 or~20-50 X 10</p><p>40 Gbps (Switching</p><p>2011</p><p>~5 X 40 or</p><p>~20 X 10</p><p>~25 X 40 or ~100 X 10 </p><p>2nd Gen ( GridsTerabit Networks</p><p>2013</p><p>~Terabit</p><p>~MultiTbps</p><p>~Fill One Fiber </p></li><li><p>GENEVAABILENEESNETCALREN2NewYorkSTAR-TAPSTARLIGHTDataTAG ProjectEU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partnersMain Aims: Ensure maximum interoperability between US and EU Grid ProjectsTransatlantic Testbed for advanced network research2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003 Wave Triangle2.5 to 10G10G10G</p></li><li><p>Progress: Max. Sustained TCP Thruput on Transatlantic and US Links 8-9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN 11/5/01 125 Mbps in One Stream (modified kernel): CIT-CERN 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links 3/11/02 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link 11-12/02 FAST: 940 Mbps in 1 Stream SNV-CERN; 9.4 Gbps in 10 Flows SNV-Chicago*Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e </p></li><li><p>FAST (Caltech): A Scalable, Fair Protocol for Next-Generation Networks: from 0.1 To 100 Gbps URL: netlab.caltech.edu/FASTSC2002 11/02C. Jin, D. Wei, S. LowFAST Team & Partners22.8.02IPv6 9.4.021 flow29.3.00multipleBaltimore-GenevaBaltimore-SunnyvaleSC20021 flow SC20022 flowsSC200210 flowsI2 LSRSunnyvale-GenevaStandard Packet Size 940 Mbps single flow/GE card9.4 petabit-m/sec 1.9 times LSR 9.4 Gbps with 10 flows37.0 petabit-m/sec 6.9 times LSR</p><p> 22 TB in 6 hours; in 10 flows</p><p>Implementation Sender-side (only) mods Delay (RTT) based Stabilized VegasHighlights of FAST TCPNext: 10GbE; 1 GB/sec disk to disk </p></li><li><p>FAST TCP: Aggregate Throughput 1 flow 2 flows 7 flows 9 flows 10 flowsAverage utilization95%92%90%90%88%FASTStandard MTUUtilization averaged over > 1hr</p></li><li><p>HENP Lambda Grids:Fibers for PhysicsProblem: Extract Small Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data StoresSurvivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time. Example: Take 800 secs to complete the transaction. Then Transaction Size (TB) Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today)Summary: Providing Switching of 10 Gbps wavelengths within ~3-5 years; and Terabit Switching within 5-8 years would enable Petascale Grids with Terabyte transactions, as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.</p></li><li><p>Data Intensive Grids Now: Large Scale ProductionEfficient sharing of distributed heterogeneous compute and storage resourcesVirtual Organizations and Institutional resource sharingDynamic reallocation of resources to target specific problemsCollaboration-wide data access and analysis environmentsGrid solutions NEED to be scalable & robust Must handle many petabytes per year Tens of thousands of CPUs Tens of thousands of jobsGrid solutions presented here are supported in part by the GriPhyN, iVDGL, PPDG, EDG, and DataTag We are learning a lot from these current efforts For Example 1M Events Processed using VDT Oct.-Dec. 2002</p></li><li><p>Beyond Peoduction: Web Services for Ubiquitous Data Access and Analysis by a Worldwide Scientific Community Web Services: easy, flexible, platform-independent access to data (Object Collections in Databases) Well-adapted to use by individual physicists, teachers & students SkyQuery Example JHU/FNAL/Caltech with Web Service based access to astronomy surveys Can be individually or simultaneously queried via Web interface Simplicity of interface hides considerable server power (from stored procedures etc.) This is a Traditional Web Service, with no user authentication required</p></li><li><p>COJAC: CMS ORCA Java Analysis Component: Java3D Objectivity JNI Web ServicesDemonstrated Caltech-Riode Janeiro and Chile in 2002</p></li><li><p>Grid projects have been a step forward for HEP and LHC: a path to meet the LHC Computing challenges Virtual Data Concept: applied to large-scale automated data processing among worldwide-distributed regional centers The original Computational and Data Grid concepts are largely stateless, open systems: known to be scalable Analogous to the WebThe classical Grid architecture has a number of implicit assumptions The ability to locate and schedule suitable resources, within a tolerably short time (i.e. resource richness) Short transactions; Relatively simple failure modesHEP Grids are data-intensive and resource constrained Long transactions; some long queues Schedule conflicts; policy decisions; task redirection A Lot of global system state to be monitored+trackedLHC Distributed CM: HENP Data Grids Versus Classical Grids</p></li><li><p>Current Grid Challenges: SecureWorkflow Management and OptimizationMaintaining a Global View of Resources and System StateCoherent end-to-end System MonitoringAdaptive Learning: new algorithms and strategies for execution optimization (increasingly automated)Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete TasksBalance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority JobsGoal-Oriented Algorithms; Steering Requests According to (Yet to be Developed) MetricsHandling User-Grid Interactions: Guidelines; AgentsBuilding Higher Level Services, and an Integrated Scalable User Environment for the Above</p></li><li><p>Distributed System Services Architecture (DSSA): CIT/Romania/Pakistan</p><p>Agents: Autonomous, Auto-discovering, self-organizing, collaborative Station Servers (static) host mobile Dynamic ServicesServers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasksAdaptable to Web services: OGSA; and many platformsAdaptable to Ubiquitous, mobile working environmentsManaging Global Systems of Increasing Scope and Complexity, In the Service of Science and Society, Requires A New Generation of Scalable, Autonomous, Artificially Intelligent Software Systems</p></li><li><p>By I. Legrand (Caltech)Deployed on US CMS GridAgent-based Dynamic information / resource discovery mechanismTalks w/Other Mon. Systems Implemented in Java/Jini; SNMPWDSL / SOAP with UDDIPart of a Global Grid Control Room ServiceMonaLisa: A Globally Scalable Grid Monitoring System</p></li><li><p>MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 9)NUST20 CPUsCERN 30 CPUsCALTECH25 CPUs1MB/s ; 150 ms RTT 1.2 MB/s 150 ms RTT 0.8 MB/s 200 ms RTTOptimized: Day = 9 = 0.73 = 0.66 = 0.83</p></li><li><p> Develop and build Dynamic Workspaces Build Private Grids to support scientific analysis communities Using Agent Based Peer-to-peer Web Services Construct Autonomous Communities Operating Within Global Collaborations Empower small groups of scientists (Teachers and Students) to profit from and contribute to intl big science Drive the democratization of science via the deployment of new technologiesNSF ITR: Globally EnabledAnalysis Communities</p></li><li><p>Private Grids and P2P Sub-Communities in Global CMS</p></li><li><p>14600 Host Devices; 7800 Registered Users in 64 Countries 45 Network Servers Annual Growth 2 to 3X </p></li><li><p>An Inter-Regional Center for Research, Education and Outreach, and CMS CyberInfrastuctureFoster FIU and Brazil (UERJ) Strategic Expansion Into CMS Physics Through Grid-Based Computing Development and Operation for Science of International Networks, Grids and Collaborative SystemsFocus on Research at the High Energy frontierDeveloping a Scalable Grid-Enabled Analysis EnvironmentBroadly Applicable to Science and EducationMade Accessible Through the Use of Agent-Based (AI) Autonomous Systems; and Web ServicesServing Under-Represented CommunitiesAt FIU and in South AmericaTraining and Participation in the Development of State of the Art TechnologiesDeveloping the Teachers and Trainers Relevance to Science, Education and Society at LargeDevelop the Future Science and Info. S&E Workforce Closing the Digital Divide</p><p>Berkeley, June 7, 2002 (Bannister, Walrand, Fisher)IPAM, Arrowhead, June 10, 2002 (Doyle, Low, Willinger)Postech-Caltech Symposium, Pohang, S. Korea, Sept 30, 2002 (Jin Lee)Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct 3, 2002 (Hajek, Srikant)IEEE Computer Communications Workshop (CCW), Santa Fe, NM, Oct 14, 2002 (Kunniyur), presenter: DawnLee Center Industry Workshop, Oct 18, 2002, Pasadena (include Cheng Jins slides)</p></li></ul>