Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. HPSS. HPSS. HPSS. HPSS. LHC Vision: Data Grid Hierarchy. - PowerPoint PPT Presentation

Text of Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology

  • Issues for Grids and WorldWide Computing

    Harvey B Newman California Institute of TechnologyACAT2000 Fermilab, October 19, 2000

  • LHC Vision: Data Grid HierarchyTier 1Online SystemOffline Farm, CERN Computer Ctr > 20 TIPSFranceCenterFNAL Center Italy Center UK Center InstituteInstituteInstituteInstitute ~0.25TIPSWorkstations~100 MBytes/sec~2.5 Gbits/sec 100 - 1000 Mbits/sec1 Bunch crossing; ~17 interactions per 25 nsecs; 100 triggers per second. Event is ~1 MByte in sizePhysicists work on analysis channelsEach institute has ~10 physicists working on one or more channelsPhysics data cache~PBytes/sec~0.6-2.5 Gbits/sec~622 Mbits/secTier 0 +1Tier 3Tier 4Tier 2Experiment

  • US-CERN Link BW RequirementsProjection (PRELIMINARY)[#] Includes ~1.5 Gbps Each for ATLAS and CMS, Plus Babar, Run2 and Other[*] D0 and CDF at Run2: Needs Presumed to Be to be Comparable to BaBar

    2001

    2002

    2003

    2004

    2005

    2006

    Installed Link BW in Mbps

    Incl. New SLAC Throughput [*]

    310

    (120)

    622

    (250)

    1600

    (400)

    2400

    (600)

    4000

    (1000)

    6500 [#]

    (1600)

    Chart6

    20

    45

    155

    155

    350

    700

    1200

    US-CERN Bandwith

    Years

    Bandwdth in Mbps

    Chart1

    310

    622

    1600

    2400

    4000

    6500

    Bandwidth (Mbps)

    Sheet1

    FY1999FY2000FY2001FY2002FY2003FY2004FY2005FY2006

    20453106221600240040006500

    Sheet1

    20

    45

    310

    622

    1600

    2400

    4000

    US-CERN Bandwith

    Years

    Bandwdth in Mbps

    Sheet2

    Sheet3

  • Grids: The Broader Issues and RequirementsA New Level of Intersite Cooperation, and Resource SharingSecurity and Authentication Across World-Region BoundariesStart with cooperation among Grid Projects (PPDG, GriPhyN, EU DataGrid, etc.)Develop Methods for Effective HEP/CS Collaboration In Grid and VDT DesignJoint Design and Prototyping Effort, with (Iterative) Design SpecificationsFind an Appropriate Level of Abstraction Adapted to > 1 Experiment; > 1 Working EnvironmentBe Ready to Adapt to the Coming RevolutionsIn Network, Collaborative, and Internet Information Technologies

  • PPDGBaBar Data ManagementBaBarD0CDFNuclear PhysicsCMSAtlasGlobus UsersSRB UsersCondor UsersHENP GC UsersCMS Data ManagementNucl Physics Data ManagementD0 Data ManagementCDF Data ManagementAtlas Data ManagementGlobus TeamCondorSRB TeamHENP GC

  • GriPhyN: PetaScale Virtual Data Grids Build the Foundation for Petascale Virtual Data GridsVirtual Data ToolsRequest Planning & Scheduling ToolsRequest Execution & Management ToolsTransformsDistributed resources(code, storage,computers, and network)Resource Management ServicesResource Management ServicesSecurity and Policy ServicesSecurity and Policy ServicesOther Grid ServicesOther Grid ServicesInteractive User ToolsProduction TeamIndividual InvestigatorWorkgroupsRaw data source

  • EU-Grid ProjectWork Packages

    Work PackageNumber

    Work Package title

    Lead contractor

    WP1

    Grid Workload Management

    INFN

    WP2

    Grid Data Management

    CERN

    WP3

    Grid Monitoring Services

    PPARC

    WP4

    Fabric Management

    CERN

    WP5

    Mass Storage Management

    PPARC

    WP6

    Integration Testbed

    CNRS

    WP7

    Network Services

    CNRS

    WP8

    High Energy Physics Applications

    CERN

    WP9

    Earth Observation Science Applications

    ESA

    WP10

    Biology Science Applications

    INFN

    WP11

    Dissemination and Exploitation

    INFN

    WP12

    Project Management

    CERN

  • Grid Issues: A Short List of Coming RevolutionsNetwork TechnologiesWireless Broadband (from ca. 2003)10 Gigabit Ethernet (from 2002: See www.10gea.org) 10GbE/DWDM-Wavelength (OC-192) integration: OXCInternet Information Software TechnologiesGlobal Information Broadcast ArchitectureE.g the Multipoint Information Distribution Protocol (MIDP; Tie.Liao@inria.fr)Programmable Coordinated Agent ArchtecturesE.g. Mobile Agent Reactive Spaces (MARS) by Cabri et al., Univ. of ModenaThe Data Grid - Human InterfaceInteractive monitoring and control of Grid resources By authorized groups and individualsBy Autonomous Agents

  • CA*net 3 National Optical Internetin Canada GigaPOPVancouverCalgaryReginaWinnipegOttawaMontrealTorontoHalifaxSt. JohnsFrederictonCharlottetownORANBCnetNeteraSRnetMRnetONetRISQACORNChicagoSTAR TAPCA*net 3 Primary RouteSeattleNew YorkLos AngelesCA*net 3 Diverse Route Deploying a 4 channel CWDM Gigabit Ethernet network 400 kmDeploying a 4 channel Gigabit Ethernet transparent optical DWDM 1500 kmMultiple Customer Owned Dark Fiber Networks connecting universities and schools16 channel DWDM-8 wavelengths @OC-192 reserved for CANARIE-8 wavelengths for carrier and other customersConsortium Partners:Bell NexxiaNortelCiscoJDS UniphaseNewbridgeCondo Dark Fiber Networks connecting universities and schoolsCondo Fiber Network linking all universities and hospital

  • CA*net 4 Possible ArchitectureVancouverCalgaryReginaWinnipegOttawaMontrealTorontoHalifaxSt. JohnsFrederictonCharlottetownChicagoSeattleNew YorkLos AngelesMiamiEurope

    Dedicated Wavelength or SONET channelOBGP switchesOptional Layer 3 aggregation serviceLarge channel WDM system

  • OBGP Traffic Engineering - PhysicalIntermediate ISPTier 1 ISPTier 2 ISPAS 1AS 2AS 3AS 4AS 5Dual Connected Router to AS 5Optical switch looks like BGP router and AS1 is direct connected to Tier 1 ISP but still transits AS 5Router redirects networks with heavy traffic load to optical switch, but routing policy still maintained by ISPBulk of AS 1 traffic is to Tier 1 ISPFor simplicity only data forwarding paths in one direction shownRed Default Wavelength

  • VRVS Remote Collaboration System: Statistics 30 Reflectors 52 CountriesMbone, H.323, MPEG2Streaming, VNC

  • VRVS: Mbone/H.323/QT Snapshot

  • VRVS R&D: Sharing DesktopVNC technology integrated in the upcoming VRVS release

  • Worldwide Computing IssuesBeyond Grid Prototype Components: Integration of Grid Prototypes for End-to-end Data TransportParticle Physics Data Grid (PPDG) ReqM; SAM in D0PPDG/EU DataGrid GDMP for CMS HLT ProductionsStart Building the Grid System(s): Integration with Experiment-specific software frameworksDerivation of Strategies (MONARC Simulation System) Data caching, query estimation, co-schedulingLoad balancing and workload management amongst Tier0/Tier1/Tier2 sites (SONN by Legrand)Transaction robustness: simulate and verify Transparent Interfaces for Replica ManagementDeep versus shallow copies: Thresholds; tracking, monitoring and control

  • Grid Data Management Prototype (GDMP)Distributed Job Execution and Data Handling: GoalsTransparencyPerformanceSecurity Fault ToleranceAutomation

    Submit jobReplicate dataReplicatedataSite ASite BSite C Jobs are executed locally or remotely Data is always written locally Data is replicated to remote sitesJob writes data locally

  • MONARC Simulation: Physics Analysis at Regional Centres Similar data processing jobs are performed in each of several RCsThere is profile of jobs, each submitted to a job schedulerEach Centre has TAG and AOD databases replicated.Main Centre provides ESD and RAW data Each job processes AOD data, and also a a fraction of ESD and RAW data.

  • ORCA Production on CERN/IT-LoanedEvent Filter Farm Test Facility

    PileupDBPileupDBPileupDBPileupDBPileupDBHPSSPileupDBPileupDB

    Output Server

    Output Server

    Lock Server

    Lock Server

    SUN...FARM 140 Processing Nodes 17 Servers9 ServersTotal 24 Pile Up ServersThe strategy is to use many commodity PCs as Database Servers

  • Network Traffic & Job efficiency Mean measured Value ~48MB/sMeasurementSimulation

  • From UserFederation To Private CopyCDCHMDMHTHMCUF.bootMyFED.bootUserCollectionMDCDMCTDAMSORCA 4 tutorial, part II - 14. October 2000

  • Beyond Traditional Architectures:Mobile AgentsMobile Agents: (Semi)-Autonomous, Goal Driven, AdaptiveExecute AsynchronouslyReduce Network Load: Local ConversationsOvercome Network Latency; Some OutagesAdaptive Robust, Fault TolerantNaturally Heterogeneous Extensible Concept: Coordinated Agent ArchitecturesAgents are objects with rules and legs -- D. TaylorApplicationServiceAgentAgent

  • Coordination Architectures for Mobile Java AgentsA lot of Progress since 1998Fourth Generation Architecture: Associative BlackboardsAfter 1) Client/Server, 2) Meeting-Oriented, 3) Blackboards;Analogous to CMS ORCA software: Observer-based action on demandMARS: Mobile Agent Reactive Spaces (Cabri et al.) See http://sirio.dsi.unimo.it/MOONResilient and Scalable; Simple ImplementationWorks with standard Agent implementations (e.g. Aglets: http://www.trl.ibm.co.jp)Data-oriented, to provide temporal and spatial asynchronicity (See Java Spaces, Page Spaces)Programmable, authorized reactions, based on virtual Tuple spaces

  • Mobile Agent Reactive Spaces (MARS) ArchitectureMARS Programmed Reactions: Based on Metalevel 4-Ples: (Reaction, Tuple, Operation-Type, Agent-ID)Allows Security, PoliciesAllows Production of Tuple on Demand

    The InternetNETWORK NODE

    Tuple SpaceMetaLevel Tuple spaceAgent Server

    NETWORK NODE

    AReference to the local Tuple SpaceBCA: Agents ArriveB: They Get Ref. To Tuple SpaceC: They Access Tuple SpaceD: Tuple Space Reacts, with Programmed BehaviorD

  • GRIDs In 2000: SummaryGrids are (in) our Future Lets Get to Work

  • Grid Data ManagementIssues Data movem

Recommended

View more >