Upload
ashley-oliver
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Advances in Integrated Storage, Transfer and Network Management
<x>Email: <x>@<y>
On behalf of the Ultralight Collaboration
The network is a key element, that HEP needs to work on, as much as its Tier1 and Tier2 facilities.
Network bandwidth is going to be a scarce, strategic resource; the ability to co-schedule network and other resources will be crucial US CMS Tier1/Tier2 experience has shown this (tip of the iceberg)
HEP needs to work with US LHCNet network providers/operators/managers, and experts on the upcoming circuit-oriented management paradigm
Joint planning is needed, for a consistent outcome The provision of managed circuits is also a direction for ESNet and
Internet2, but LHC networking will require it first We also need to exploit now-widely available technologies:
Reliable data transfers, monitoring in-depth, end-to-end workflow information (including every network flow, end system state and load)
Risk: lack of efficient workflow, readiness lack of competitiveness Lack of engagement has made this a DOE-funding issue as well;
even the planned bandwidth may not be there
Networking and HEP
The Planned The Planned bandwidth for 2008-2010 is already well below the bandwidth for 2008-2010 is already well below the capabilities of current end-systems, appropriately tunedcapabilities of current end-systems, appropriately tuned Already evident in intra-US production flowsAlready evident in intra-US production flows
(Nebraska, UCSD, Caltech)(Nebraska, UCSD, Caltech) ““Current capability” limit of nodes is still an order of magnitude greater Current capability” limit of nodes is still an order of magnitude greater
than best results seen in production so far than best results seen in production so far This can be extended to transatlantic flows, using a series of This can be extended to transatlantic flows, using a series of
straightforward technical steps: configuration, diagnosis, monitoringstraightforward technical steps: configuration, diagnosis, monitoringInvolving the end-hosts and storage systems, as well as networks Involving the end-hosts and storage systems, as well as networks
The current lack of transfer volume from non-US Tier1s sites is The current lack of transfer volume from non-US Tier1s sites is not, and cannot be not, and cannot be an indicator of future performancean indicator of future performance
Upcoming paradigm: managed bandwidth channelsUpcoming paradigm: managed bandwidth channels Scheduled: multiple classes of “work”, settable numberScheduled: multiple classes of “work”, settable number
of simultaneous flows per class; like computing facility queuesof simultaneous flows per class; like computing facility queues Monitored and managed: BW will be matched to throughput capability Monitored and managed: BW will be matched to throughput capability
[schedule coherently; free up excess BW resources as needed][schedule coherently; free up excess BW resources as needed] Respond to: errors; schedule changes; progress-rate deviations Respond to: errors; schedule changes; progress-rate deviations
Transatlantic Networking:Bandwidth Management with Channels
Computing, Offline and CSA07 (CMS)Computing, Offline and CSA07 (CMS)
US Tier1 – Tier2 Flows Reach 10 Gbps 1/2007 Snapshot
Computing, Offline and CSA07Computing, Offline and CSA07
Nebraska
One well configured site. What happens if there are
10 such sites?
The Planned The Planned bandwidth for 2008-2010 is already well below the bandwidth for 2008-2010 is already well below the capabilities of current end-systems, appropriately tunedcapabilities of current end-systems, appropriately tuned Already evident in intra-US production flowsAlready evident in intra-US production flows
(Nebraska, UCSD, Caltech)(Nebraska, UCSD, Caltech) ““Current capability” limit of nodes is still an order of magnitude greater Current capability” limit of nodes is still an order of magnitude greater
than best results seen in production than best results seen in production This can be extended to transatlantic flows, using a series of This can be extended to transatlantic flows, using a series of
straightforward technical steps: configuration, diagnosis, monitoringstraightforward technical steps: configuration, diagnosis, monitoringInvolving the end-hosts and storage systems, as well as networks Involving the end-hosts and storage systems, as well as networks
The current lack of transfer volume from non-US Tier1s sites is The current lack of transfer volume from non-US Tier1s sites is not, and cannot be not, and cannot be an indicator of future performancean indicator of future performance
Upcoming paradigm: managed bandwidth channelsUpcoming paradigm: managed bandwidth channels Scheduled: multiple classes of “work”, settable numberScheduled: multiple classes of “work”, settable number
of simultaneous flows per class; like computing facility queuesof simultaneous flows per class; like computing facility queues Monitored and managed: BW will be matched to throughput capability Monitored and managed: BW will be matched to throughput capability
[schedule coherently; free up excess BW resources as needed][schedule coherently; free up excess BW resources as needed] Respond to: errors; schedule changes; progress-rate deviations Respond to: errors; schedule changes; progress-rate deviations
Transatlantic Networking:Bandwidth Management with Channels
Internet2’s New BackboneInternet2’s New Backbone
Initial deployment – 10 x 10 Gbps wavelengths over the footprintInitial deployment – 10 x 10 Gbps wavelengths over the footprintFirst round maximum capacity – 80 x 10 Gbps wavelengths; expandableFirst round maximum capacity – 80 x 10 Gbps wavelengths; expandableScalability – potential migration to 40 Gbps or 100 Gbps capabilityScalability – potential migration to 40 Gbps or 100 Gbps capabilityTransition to NewNet underway now (since 10/2006), until end of 2007Transition to NewNet underway now (since 10/2006), until end of 2007
Level(3) Footprint;Level(3) Footprint;Infinera 10 X 10G Core;Infinera 10 X 10G Core;CIENA Optical Muxes CIENA Optical Muxes
Today’s situationToday’s situation End-hosts / applications unaware of network topology, available End-hosts / applications unaware of network topology, available
paths, bandwidth available (with other flows) on eachpaths, bandwidth available (with other flows) on each Network unaware of transfer schedules, end-host capabilitiesNetwork unaware of transfer schedules, end-host capabilities
High performance networks are not a ubiquitous resourceHigh performance networks are not a ubiquitous resource Applications might get full BW only on the Applications might get full BW only on the accessaccess part of network part of network
Usually only to the next routerUsually only to the next router Mostly not beyond the network under the site’s controlMostly not beyond the network under the site’s control
Better utilization needs joint management; work Better utilization needs joint management; work togethertogether To schedule and allocate To schedule and allocate Network Network resources along with storage resources along with storage
and CPU: using queues, and quotas where neededand CPU: using queues, and quotas where needed
Data Transfer applications can no longer treat network as a passive resource. Data Transfer applications can no longer treat network as a passive resource. They can only be efficient if they are “network-aware”: communicating withThey can only be efficient if they are “network-aware”: communicating with
and delegating many of their functions to network-resident servicesand delegating many of their functions to network-resident services
Network Services for Robust Data Transfers: Motivation
Data Transfer Experience Transfers are generally much slower than expected, or slow down, or
even stop. Many potential causes remain undiagnosed: Configuration problem ? Loading ? Queuing ? Errors ? etc.End-host problem, or network problem, or application failure ?
Application (blindly) retries, and perhaps sends emails through hypernews, to request “expert” troubleshooting Insufficient information, too slow to diagnose and correlate
at the time the error occurs Result: lower transfer rates, people spending more and more time to
troubleshoot Can a manual, non-integrated approach scale to a collaboration the
size of CMS/ATLAS when data transfer needs will increase; where we will compete for (network) resources with other experiments (and sciences) ?
Phedex (CMS) and FTS are not the only end-users’ applications that face transfer challenges (Ask Atlas for example)
Managed Network Services Integration
The Network alone is not enoughNeed proper integration and interaction needed with
the end-hosts, or storage systems Multiple applications will want to use the network resource
ATLAS; ALICE and LHCb Private transfer tools; Analysis applications streaming data Real-time streams (e.g. videoconferencing)
No need for different domains to develop their own monitoring and troubleshooting tools for a resource that they shareManpower intensive; duplication of effort; “reinvention”Lack of sufficient operational expertise, lack of access
to sufficient information, and lack of controlSo these systems will (and do) lack the needed functionality,
performance, and reliability
Managed Network Services Strategy (1)
Network Services Integrated with End Systems Need to use a real-time, end-to-end view of the network and end-systems: based on end-to-end monitoring, in depth Need to Extract and Correlate information (e.g. network state, end-host state, transfer queues-state)
Solution: Network and end-host exchange information via real-time services and “end-host agents” (EHA)
Provide sufficient information for decision support EHAs and network services can cooperate, to Automate some operational decisions, based on accumulated experience
especially where they become reliable enough automation will become essential as the usage and number of users scale up, and competition increases
Analogous to the real-time operation and
internal monitoring and diagnosis
Managed Network Services Strategy (2)
[Semi-]Deterministic Scheduling Receive request: “Transfer N bytes from Site A to B with throughput R1”; authenticate/authorize/prioritize Verify end-host (HW, config.) capability (R2); schedule bandwidth B > R2; estimate time to complete T(0) Schedule path, with priorities P(i) on segment S(i) Check periodically: compare rate R(t) to R2, compare and update time to complete T(i) to T(i-1) Triggers: error (e.g. segment failure); variance beyond thresholds (e.g. progress too slow; channel underutilized, wait in queue too long); state change (e.g. new high priority transfer submitted) Dynamic Actions: build alternative path; change channel size; create new channel and squeeze others in class, etc.
Real-time control and adaptation: an
optimizable system
(3) DTSS re-routes the traffic (3) DTSS re-routes the traffic if bandwidth available on if bandwidth available on a different route a different route
Transparent to end-systemsTransparent to end-systemsRerouting takes into account Rerouting takes into account
transfer priority transfer priority
(4) Otherwise if no alternative, (4) Otherwise if no alternative, DTSS notifies EHAsDTSS notifies EHAs DTSS puts transfer on holdDTSS puts transfer on hold
NMA: Network Monitoring AgentEHA: End-Host AgentDTSS: Dataset Transfer & Scheduling Service
End hosts and applications must interact with network
services to achieve this
Network Aware Scenario: Link- Segment Outage (3) Reroute/
Circuit Setup
EHAEHA
End-system (Source)
End-system (Target)
EHAEHA
Data Transfer
NMANMA
(1) Monitoring
(1) NMA detects a problem (1) NMA detects a problem (e.g. link-segment down) (e.g. link-segment down) (2) Notifies DTSS(2) Notifies DTSS
(2) Notify (4) Notify DTSS (fully
distributed)
DTSS (fully distributed)
Network Aware Scenario: End-Host Problem
(1) EHA detects problem with (1) EHA detects problem with end-host (e.g. throughput on end-host (e.g. throughput on NIC lower than expected)NIC lower than expected) Notifies DTSSNotifies DTSS
(2) DTSS squeezes the (2) DTSS squeezes the corresponding circuitcorresponding circuit Gives more bandwidth to Gives more bandwidth to
the next highest priority the next highest priority transfer sharing (part of) transfer sharing (part of) the paththe path
(3) NMA senses increased (3) NMA senses increased throughput ; Notify DTSS (4)throughput ; Notify DTSS (4)
NIC Problem detected Problem resolved, sensed
by network monitoring
Back to nominal BW
BW used by end-system (source)
(2) Adjust
EHAEHA
End-system (Source)
End-system (Target)
EHAEHA
Data transfer
NMANMA
(1) M
onito
ring
(4) Notify
(1) Notify(3) M
onitorin
g
Network AND End-host monitoring allow Root Cause Analysis!
NMA: Network Monitoring AgentEHA: End-Host AgentDTSS: Dataset Transfer & Scheduling Service
DTSS (fully distributed)
DTSS (fully distributed)
End Host Agent (EHA)
EHA implements secure APIs that applications can use for requests and interactions with network services.
EHA continuously receives estimations of time-to-complete its requests Where possible, EHA will help provide correct system & network configuration
to the end-hosts.
Minimize!
CPUCPU
NetNet
DiskDisk
SystemSystem
MemMem
Authorization; Service discovery; System configurationComplete local profiling of host hardware and softwareComplete monitor of host processes, activity, load End to end performance measurementsAct as active listener to “events” generated by local apps
NetworkNetwork Disk I/ODisk I/O
End Host/User Agent is a Reality in EVO (also FDT, VINCI, etc.)
Enable (suggested) automatic actions based on anomalies to ensure proper quality (e.g.: automatically limit video
if bandwidth is limited)
EHA monitors various system &
network parameters Key Characteristic
A coherent real-time architecture of
inter-communicating agents and services
with correlated actions
Specialized Agents Supporting Robust Data Transfer Operations
Example Agent Services Supporting Transfer Operations & Management Build a real-time, dynamic map of the inter-connection topology among
the services, & the channels on each link-segment, including: How the channels are allocated (bandwidth, priority) and used Information on the total traffic and major flows on each segment
Locate network resources & services which have agreements with a given VO
List routers and switches along each path that are “manageable” Determine which path-options exist between source and destination Among N replicas of a data source, and M possible network paths,
“Discover” (with monitoring) the least estimated time to completion of a transfer to a given destination; take reliability-record into account
Detect problems in real-time such as : Loss or impairment of connectivity; crc errors; asymmetric routing Large rate of lost packets, or retransmissions (above a threshold)
The Scheduler
Independent agents control different segments or channels Negotiate for end-to-end connection, using cost functions to find
“lowest-cost” path (on the topological graph): based on VO, priorities, pre-reservations, etc.
An agent offers a “lease” for use of a segment or channel to its peers Periodic lease renewals are used by all agents
Allows a flexible response of the agents to task completion, application failure or network errors.
If a segment failure or impairment is detected, DTSS automatically provides path-restoration based on the discovered network topology Resource reallocation may be applied to preserve high priority tasks The supervising agents then release all other channels or segments
along a path:An alternative path may be set up: rapidly enough to avoid a TCP
timeout, and thus to allow the transfer to continue uninterrupted.
b Implemented as a set of collaborating agents: No single point of failure
Channel allocation based on VO/Priority, [ + Wait time, etc.]Create on demand a End-to-end path or Channel & configure end-hosts Automatic recovery (rerouting) in case of errors Dynamic reallocation of throughputs per channel: to manage priorities,
control time to completion, where neededReallocate resources requested but not used
Dynamic Path Provisioning & Queueing
User Scheduling
ControlMonitoring
End Host Agents
RealtimeFeedback
Request
Dynamic end-2-end optical path creation & usage
Minimize emails like this!
FDT Transfer
4 fiber cut emulations
200+ MBytes/secFrom a 1U Node
FDT Disk-to-Disk WAN transfersFDT Disk-to-Disk WAN transfersBetween Two End-hostsBetween Two End-hosts
NEW YORK GENEVA
Reads and writes on 4 SATA disks in parallel on each server
Mean traffic ~ 210 MB/s~ 0.75 TB per hour
MB
/s
CERN CALTECH
Reads and writes on two 12-port RAID Controllers in parallel on each server
Mean traffic ~ 545 MB/s~ 2 TB per hour
1U Nodes with 4 Disks 4U Disk Servers with 24 Disks
Working on integrating FDT with DCache
RSS feeds and email alerts when anomalies
occur whose correction is not automated
Incrementally try to automate these anomaly-corrections (if possible)based on accumulated experience
Expose system information that cannot be handled automatically
Drill-down, with simpleauto-diagnostics, using
ML/APMon in ALIEN
System Architecture: the Main Services
Application
End UserAgent
Topology Discovery
GMPLS MPLS OS SNMP
Dynamic Path & Channel Building, Allocation
Control Path Provisioning
Failure Detection
Application
End UserAgent
Authentication, Authorization, Accounting
Learning
Prediction
System Evaluation & Optimization
MONITORING
Scheduling & Queues
Several services are already Several services are already operational and field testedoperational and field tested
System Architecture: Scenario View
Interoperation with Other Network Domains
Benefits to LHC
Higher performance transfers Better utilization of our network resources
Applications interface with network services & monitoring Instead of creation of their own, so
Less manpower required from CMS Decrease email traffic
Less time spent on support; Less manpower from CMS Providing end-to-end monitoring and alerts
More transparency for troubleshooting (if it is not automated) Incrementally work on automation of anomaly-handling where possible;
based on accumulated experience Synergy with US funded projects
Manpower is available to support the transition Distributed system, with pervasive monitoring and rapid reaction time
Decrease single points of failure Provide information and diagnosis that is not otherwise possible Several services are already operational and field tested
It is important is to establish a smooth transition from the current situation to the one
envisioned without interruption of currently
available services
We intend progressive developmentand deployment (not “all at once”).
But there are strong benefits in early deployment: cost, efficiency, readiness
Resources: For More Information
End Host Agent (LISA):
http://monalisa.caltech.edu/monalisa__Interactive_Clients__LISA.html Alice monitoring and alerts:
http://pcalimonitor.cern.ch/ Fast data transfer:
http://monalisa.cern.ch/FDT/ Network services:
http://monalisa.caltech.edu/monalisa__Service_Applications__Vinci.html EVO with End Host Agent:
http://evo.caltech.edu/ UltraLight: http://ultralight.org
See the various posters on this subject at CHEP
Extra Slides FollowExtra Slides Follow
US LHCNet, ESnet Plan 2007-2010:US LHCNet, ESnet Plan 2007-2010:30-80Gbps US-CERN, ESnet MAN30-80Gbps US-CERN, ESnet MAN
DENDEN
ELPELP
ALBALBATLATL
Metropolitan Area Rings
Aus.
Europe
SDGSDG
AsiaPacSEASEA
Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene
New ESnet hubsESnet hubs
SNVSNV
Europe
Japan
Science Data Network core, 40-60 Gbps circuit transportLab suppliedMajor international
Production IP ESnet core, 10 Gbps enterprise IP traffic
Japan
Aus.
Metro Rings
ESnet4SDN Core:
30-50G
ESnet IP Core≥10 Gbps
10Gb/s10Gb/s
30Gb/s2 x 10Gb/s
NYCNYCCHICHI
US-LHCNetNetwork Plan
(3 to 8 x 10 Gbps US-CERN)
LHCNet Data Network
DCDCGEANT2SURFNetIN2P3
NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2
CERN
FNAL
BNL
US-LHCNet: Wavelength Quadrangle
NY-CHI-GVA-AMS 2007-10: 30, 40, 60, 80
Gbps
ESNet MANs to FNAL & BNL; Dark Fiber
to FNAL; Peering With GEANT
FDT: Fast Data Transport SC06 Results 11/14 – 11/15/06
Stable disk-to-disk flows Tampa-Caltech:
Stepping up to 10-to-10 and 8-to-8 1U Server-pairs 9 + 7 = 16 Gbps; then Solid overnight. Using One 10G link
Efficient Data TransfersReading and writing at disk
speed over WANs (with TCP) for the first time
SC06 Results: 17.7 Gbps on one link; 8.6 Gbps to/from Korea
In Java: Highly portable, runs on all major platforms.
Based on an asynchronous, multithreaded system
Streams a dataset (list of files) continuously, from a managed pool of buffers in kernel space, through an open TCP socketSmooth data flow from each
disk to/from the networkNo protocol start-phase
between files New Capability Level: ~70 Gbps New Capability Level: ~70 Gbps per rack of low cost 1U serversper rack of low cost 1U servers I. Legrand
Dynamic Network Path Allocation & Automated Dataset Transfer
Dynamic Network Path Allocation & Automated Dataset Transfer
Internet
A
>mlcopy A/fileX B/path/
OS path availableConfiguring interfacesStarting Data Transfer
Mo
nito
rC
on
trol
TL
1
Optical Switch
MonALISAService
MonALISA Distributed Service System
BOSAgent
Active light path
Regul
ar IP
pat
hReal time monitoring
APPLICATION
LISA AGENTsets up
- Net Interfaces - TCP stack - Kernel - RoutesLISA APPLICATION“use eth1.2, …”
LISALISA AgentAgent
DATA
Detects errors and automatically Detects errors and automatically recreates the path in less than recreates the path in less than the TCP timeout (< 1 second)the TCP timeout (< 1 second)
The Functionality of the VINCI System
Layer 3
Layer 2
Layer 1
Site A Site B Site C
MonALISA
ML AgentML Agent
MonALISA
ML AgentML Agent
MonALISA
ML AgentML Agent
ML proxy servicesML proxy services
Agent
Agent
Agent
Agent
ROUTERS
ETHERNETLAN-PHYor WAN-PHY
DWDMFIBER
Agent
Monitoring Network Topology, Monitoring Network Topology, Latency, RoutersLatency, Routers
NETWORKS
AS
ROUTERS
Real Time Topology Discovery & DisplayReal Time Topology Discovery & Display
Four Continent TestbedFour Continent Testbed
Building a global, network-aware end-to-end managed real-time Grid
Scenario : pre-Scenario : pre-emption of emption of
lower priority lower priority transfertransfer
(1) At time t(1) At time t00, a high priority request , a high priority request (e.g. T0 – T1 transfer) is received (e.g. T0 – T1 transfer) is received by DTSSby DTSS
(2) DTSS(2) DTSS squeezes the normal priority squeezes the normal priority
channelchannel allocates necessary allocates necessary
bandwidth to high priority bandwidth to high priority transfertransfer
(3) DTSS notifies EHAs(3) DTSS notifies EHAsAt the end of the high priority At the end of the high priority
transfer (ttransfer (t11), DTSS:), DTSS: restores original allocated restores original allocated
bandwidth and notifies EHAsbandwidth and notifies EHAs
DTTS: Data Transfer & Scheduling ServiceNMA: Network Monitoring AgentEHA: End-Host Agent (2
)Circ
uit S
etup
EHAEHA
End-system 1 (Source 1)
End-system 3 (Target)
EHAEHA
Data Transfer
(3) No
tify(3)Notify
New Allocated bandwidth,High priority
Allocated bandwidth,Normal priority
Allocated bandwidth,High priority
t0 t1
EHAEHA
DTSSDTSSEnd-system 2
(Source 2)(1) Request
Advanced examplesAdvanced examples
Reservation cancellation by userReservation cancellation by user Transfer request cancellation has to be propagated to the DTSSTransfer request cancellation has to be propagated to the DTSS
Not enough to cancel the job on end-hostNot enough to cancel the job on end-host
DTSS will schedule next transfer in queue ahead of timeDTSS will schedule next transfer in queue ahead of time Application might not have data ready (e.g. staging from tape)Application might not have data ready (e.g. staging from tape) DTSS has to communicate with EHAs which could profit from early schedule, DTSS has to communicate with EHAs which could profit from early schedule,
end host has to advertise when it can accept the circuitend host has to advertise when it can accept the circuit
Reservation extension (transfer not finished on schedule)Reservation extension (transfer not finished on schedule) If possible, DTSS keeps the circuitIf possible, DTSS keeps the circuit
Same bandwidth if availableSame bandwidth if availableTake transfer priority into accountTake transfer priority into accountPossibly with quota limitationPossibly with quota limitation
Progress tracking: Progress tracking: EHA has to announce remaining transfer sizeEHA has to announce remaining transfer sizeRecalculate Estimated Time to Completion taking into account new Recalculate Estimated Time to Completion taking into account new
bandwidthbandwidth
Application in 2008 (Pseudo-log)
Node1> ts –v –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum
-TS: Initiating file transfer setup… -TS: Contact path discovery service (PDS) through EHA for transfer request -PDS: Path discovery in progress… -PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE -PDS: Path options found: -PDS: Lightpath option exists end-to-end -PDS: Virtual pipe option exists (partial) -PDS: High-performance protocol capable end-systems exist -TS: Request 1.2 TB file transfer within 2 hours 50 minutes, priority 3 -TS: Target host confirms available space for [email protected] -TS: EHA contacted…parameters transferred -EHA: Priority 3 request allowed for [email protected] -EHA: request scheduling details -EHA: Lightpath prior scheduling (higher/same priority) precludes use -EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes -EHA: request monitoring prediction along path -EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over
next 2 hours 50 minutes
Hybrid Nets with Circuit-Oriented Services: Hybrid Nets with Circuit-Oriented Services: Lambda Station ExampleLambda Station Example
A network path forwarding service to interface production A network path forwarding service to interface production facilities with advanced research networks:facilities with advanced research networks:
Goal is Goal is selective forwarding on a selective forwarding on a per flowper flow basis basis& Alternate network paths for high impact data movement & Alternate network paths for high impact data movement
Dynamic path modification, with graceful cutover & fallbackDynamic path modification, with graceful cutover & fallback
Lambda Station Lambda Station interacts with:interacts with: Host applicationsHost applications
& systems & systems LAN infrastructureLAN infrastructure Site border infrastructureSite border infrastructure Advanced technology WANsAdvanced technology WANs Remote Lambda StationsRemote Lambda Stations
D. Petravick, P. DeMar
Also OSCARs, Terapaths, UltraLight
Investigate:Investigate: Integration & Use of Integration & Use of LAN QoS and MPLS-BasedLAN QoS and MPLS-BasedDifferentiated Network ServicesDifferentiated Network Services in the ATLAS Distributed in the ATLAS Distributed Computing EnvironmentComputing Environment As a Way to As a Way to Manage the Network As a Critical ResourceManage the Network As a Critical Resource
TeraPaths: BNL and Michigan; Partnering with TeraPaths: BNL and Michigan; Partnering with OSCARS (ESnet), LStation (FNAL) and DWMI (SLAC)OSCARS (ESnet), LStation (FNAL) and DWMI (SLAC)
MPLS and Remote LAN QoS requests
GridFtp & SRM
LAN/MPLSESnet
TeraPaths
Resource
Manager
Traffic Identification:addresses, port #,
DSCP bits
Grid AA
Network Usage Policy
Data ManagementData Transfer
Bandwidth
Requests &
Releases
OSCARS
IN(E)GRESS
Monitoring
SE
LA
N Q
oS
LA
N Q
oS
M10
LA
N/M
PLS
Remote TeraPaths
Dantong Yu
Application in 2008 Node1> ts –v –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum
-TS: Initiating file transfer setup… -TS: Contact path discovery service (PDS) through EHA for transfer request -PDS: Path discovery in progress… -PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE -PDS: Path options found: -PDS: Lightpath option exists end-to-end -PDS: Virtual pipe option exists (partial) -PDS: High-performance protocol capable end-systems exist -TS: Request 1.2 TB file transfer within 2 hours 50 minutes, priority 3 -TS: Target host confirms available space for [email protected] -TS: EHA contacted…parameters transferred -EHA: Priority 3 request allowed for [email protected] -EHA: request scheduling details -EHA: Lightpath prior scheduling (higher/same priority) precludes use -EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes -EHA: request monitoring prediction along path -EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over
next 2 hours 50 minutes
Network Control &Configuration
Topological Graph and traffic information
Applications (Data Transfers, Storage) Monitoring
Scheduling End Hosts Control
& Configuration
Reservations & Active Transfers lists
User
Progress Monitor
ApMon, dedicated modulesSNMP, sFlow, TL1, trace, pingAvBw, /proc , TCP conf.
DedicatedAgents
Request
Options
Decision
The main components for Scheduler
Priority Queues for each segment
Possible path / channels options for the requested priority
Network & end hostsmonitoring
Failures Detection
Learning and Prediction
We must also consider the applications that do not use our APIs and to provide effective network services for all the applications.
Learning algorithms (like Self Organizing Neural Networks) will be used to evaluate the traffic created by other applications, to identify major patters and to dynamically setup effective connectivity maps based on the monitoring data.
It is very difficult if not impossible to assume that we could predict all possible events in a complex environment like the GRID, and compile the knowledge about those events in advance.
Learning is the only practical approach in which agents can acquire necessary information to describe their environments.
We approach this multi-agent learning task from two different levels: the local level of individual learning agents the global level of inter-agent communication
We need to ensure that each agent can be optimized from local perspective, while the globally monitoring mechanism acts as a ‘driving force’ to evolve the agents collectively based on the previous experience .