Advances in Integrated Storage, Transfer and Network Management Email: @ On behalf of the Ultralight Collaboration

Advances in Integrated Storage, Transfer and Network Management

<x>Email: <x>@<y>

On behalf of the Ultralight Collaboration

The network is a key element, that HEP needs to work on, as much as its Tier1 and Tier2 facilities.

Network bandwidth is going to be a scarce, strategic resource; the ability to co-schedule network and other resources will be crucial US CMS Tier1/Tier2 experience has shown this (tip of the iceberg)

HEP needs to work with US LHCNet network providers/operators/managers, and experts on the upcoming circuit-oriented management paradigm

Joint planning is needed, for a consistent outcome The provision of managed circuits is also a direction for ESNet and

Internet2, but LHC networking will require it first We also need to exploit now-widely available technologies:

Reliable data transfers, monitoring in-depth, end-to-end workflow information (including every network flow, end system state and load)

Risk: lack of efficient workflow, readiness lack of competitiveness Lack of engagement has made this a DOE-funding issue as well;

even the planned bandwidth may not be there

Networking and HEP

The Planned The Planned bandwidth for 2008-2010 is already well below the bandwidth for 2008-2010 is already well below the capabilities of current end-systems, appropriately tunedcapabilities of current end-systems, appropriately tuned Already evident in intra-US production flowsAlready evident in intra-US production flows

(Nebraska, UCSD, Caltech)(Nebraska, UCSD, Caltech) ““Current capability” limit of nodes is still an order of magnitude greater Current capability” limit of nodes is still an order of magnitude greater

than best results seen in production so far than best results seen in production so far This can be extended to transatlantic flows, using a series of This can be extended to transatlantic flows, using a series of

straightforward technical steps: configuration, diagnosis, monitoringstraightforward technical steps: configuration, diagnosis, monitoringInvolving the end-hosts and storage systems, as well as networks Involving the end-hosts and storage systems, as well as networks

The current lack of transfer volume from non-US Tier1s sites is The current lack of transfer volume from non-US Tier1s sites is not, and cannot be not, and cannot be an indicator of future performancean indicator of future performance

Upcoming paradigm: managed bandwidth channelsUpcoming paradigm: managed bandwidth channels Scheduled: multiple classes of “work”, settable numberScheduled: multiple classes of “work”, settable number

of simultaneous flows per class; like computing facility queuesof simultaneous flows per class; like computing facility queues Monitored and managed: BW will be matched to throughput capability Monitored and managed: BW will be matched to throughput capability

[schedule coherently; free up excess BW resources as needed][schedule coherently; free up excess BW resources as needed] Respond to: errors; schedule changes; progress-rate deviations Respond to: errors; schedule changes; progress-rate deviations

Transatlantic Networking:Bandwidth Management with Channels

Computing, Offline and CSA07 (CMS)Computing, Offline and CSA07 (CMS)

US Tier1 – Tier2 Flows Reach 10 Gbps 1/2007 Snapshot

Computing, Offline and CSA07Computing, Offline and CSA07

Nebraska

One well configured site. What happens if there are

10 such sites?

The Planned The Planned bandwidth for 2008-2010 is already well below the bandwidth for 2008-2010 is already well below the capabilities of current end-systems, appropriately tunedcapabilities of current end-systems, appropriately tuned Already evident in intra-US production flowsAlready evident in intra-US production flows

(Nebraska, UCSD, Caltech)(Nebraska, UCSD, Caltech) ““Current capability” limit of nodes is still an order of magnitude greater Current capability” limit of nodes is still an order of magnitude greater

than best results seen in production than best results seen in production This can be extended to transatlantic flows, using a series of This can be extended to transatlantic flows, using a series of

straightforward technical steps: configuration, diagnosis, monitoringstraightforward technical steps: configuration, diagnosis, monitoringInvolving the end-hosts and storage systems, as well as networks Involving the end-hosts and storage systems, as well as networks

The current lack of transfer volume from non-US Tier1s sites is The current lack of transfer volume from non-US Tier1s sites is not, and cannot be not, and cannot be an indicator of future performancean indicator of future performance

Upcoming paradigm: managed bandwidth channelsUpcoming paradigm: managed bandwidth channels Scheduled: multiple classes of “work”, settable numberScheduled: multiple classes of “work”, settable number

of simultaneous flows per class; like computing facility queuesof simultaneous flows per class; like computing facility queues Monitored and managed: BW will be matched to throughput capability Monitored and managed: BW will be matched to throughput capability

[schedule coherently; free up excess BW resources as needed][schedule coherently; free up excess BW resources as needed] Respond to: errors; schedule changes; progress-rate deviations Respond to: errors; schedule changes; progress-rate deviations

Transatlantic Networking:Bandwidth Management with Channels

Internet2’s New BackboneInternet2’s New Backbone

Initial deployment – 10 x 10 Gbps wavelengths over the footprintInitial deployment – 10 x 10 Gbps wavelengths over the footprintFirst round maximum capacity – 80 x 10 Gbps wavelengths; expandableFirst round maximum capacity – 80 x 10 Gbps wavelengths; expandableScalability – potential migration to 40 Gbps or 100 Gbps capabilityScalability – potential migration to 40 Gbps or 100 Gbps capabilityTransition to NewNet underway now (since 10/2006), until end of 2007Transition to NewNet underway now (since 10/2006), until end of 2007

Level(3) Footprint;Level(3) Footprint;Infinera 10 X 10G Core;Infinera 10 X 10G Core;CIENA Optical Muxes CIENA Optical Muxes

Today’s situationToday’s situation End-hosts / applications unaware of network topology, available End-hosts / applications unaware of network topology, available

paths, bandwidth available (with other flows) on eachpaths, bandwidth available (with other flows) on each Network unaware of transfer schedules, end-host capabilitiesNetwork unaware of transfer schedules, end-host capabilities

High performance networks are not a ubiquitous resourceHigh performance networks are not a ubiquitous resource Applications might get full BW only on the Applications might get full BW only on the accessaccess part of network part of network

Usually only to the next routerUsually only to the next router Mostly not beyond the network under the site’s controlMostly not beyond the network under the site’s control

Better utilization needs joint management; work Better utilization needs joint management; work togethertogether To schedule and allocate To schedule and allocate Network Network resources along with storage resources along with storage

and CPU: using queues, and quotas where neededand CPU: using queues, and quotas where needed

Data Transfer applications can no longer treat network as a passive resource. Data Transfer applications can no longer treat network as a passive resource. They can only be efficient if they are “network-aware”: communicating withThey can only be efficient if they are “network-aware”: communicating with

and delegating many of their functions to network-resident servicesand delegating many of their functions to network-resident services

Network Services for Robust Data Transfers: Motivation

Data Transfer Experience Transfers are generally much slower than expected, or slow down, or

even stop. Many potential causes remain undiagnosed: Configuration problem ? Loading ? Queuing ? Errors ? etc.End-host problem, or network problem, or application failure ?

Application (blindly) retries, and perhaps sends emails through hypernews, to request “expert” troubleshooting Insufficient information, too slow to diagnose and correlate

at the time the error occurs Result: lower transfer rates, people spending more and more time to

troubleshoot Can a manual, non-integrated approach scale to a collaboration the

size of CMS/ATLAS when data transfer needs will increase; where we will compete for (network) resources with other experiments (and sciences) ?

Phedex (CMS) and FTS are not the only end-users’ applications that face transfer challenges (Ask Atlas for example)

Managed Network Services Integration

The Network alone is not enoughNeed proper integration and interaction needed with

the end-hosts, or storage systems Multiple applications will want to use the network resource

ATLAS; ALICE and LHCb Private transfer tools; Analysis applications streaming data Real-time streams (e.g. videoconferencing)

No need for different domains to develop their own monitoring and troubleshooting tools for a resource that they shareManpower intensive; duplication of effort; “reinvention”Lack of sufficient operational expertise, lack of access

to sufficient information, and lack of controlSo these systems will (and do) lack the needed functionality,

performance, and reliability

Managed Network Services Strategy (1)

Network Services Integrated with End Systems Need to use a real-time, end-to-end view of the network and end-systems: based on end-to-end monitoring, in depth Need to Extract and Correlate information (e.g. network state, end-host state, transfer queues-state)

Solution: Network and end-host exchange information via real-time services and “end-host agents” (EHA)

Provide sufficient information for decision support EHAs and network services can cooperate, to Automate some operational decisions, based on accumulated experience

especially where they become reliable enough automation will become essential as the usage and number of users scale up, and competition increases

Analogous to the real-time operation and

internal monitoring and diagnosis

Managed Network Services Strategy (2)

[Semi-]Deterministic Scheduling Receive request: “Transfer N bytes from Site A to B with throughput R1”; authenticate/authorize/prioritize Verify end-host (HW, config.) capability (R2); schedule bandwidth B > R2; estimate time to complete T(0) Schedule path, with priorities P(i) on segment S(i) Check periodically: compare rate R(t) to R2, compare and update time to complete T(i) to T(i-1) Triggers: error (e.g. segment failure); variance beyond thresholds (e.g. progress too slow; channel underutilized, wait in queue too long); state change (e.g. new high priority transfer submitted) Dynamic Actions: build alternative path; change channel size; create new channel and squeeze others in class, etc.

Real-time control and adaptation: an

optimizable system

(3) DTSS re-routes the traffic (3) DTSS re-routes the traffic if bandwidth available on if bandwidth available on a different route a different route

Transparent to end-systemsTransparent to end-systemsRerouting takes into account Rerouting takes into account

transfer priority transfer priority

(4) Otherwise if no alternative, (4) Otherwise if no alternative, DTSS notifies EHAsDTSS notifies EHAs DTSS puts transfer on holdDTSS puts transfer on hold

NMA: Network Monitoring AgentEHA: End-Host AgentDTSS: Dataset Transfer & Scheduling Service

End hosts and applications must interact with network

services to achieve this

Network Aware Scenario: Link- Segment Outage (3) Reroute/

Circuit Setup

EHAEHA

End-system (Source)

End-system (Target)

EHAEHA

Data Transfer

NMANMA

(1) Monitoring

(1) NMA detects a problem (1) NMA detects a problem (e.g. link-segment down) (e.g. link-segment down) (2) Notifies DTSS(2) Notifies DTSS

(2) Notify (4) Notify DTSS (fully

distributed)

DTSS (fully distributed)

Network Aware Scenario: End-Host Problem

(1) EHA detects problem with (1) EHA detects problem with end-host (e.g. throughput on end-host (e.g. throughput on NIC lower than expected)NIC lower than expected) Notifies DTSSNotifies DTSS

(2) DTSS squeezes the (2) DTSS squeezes the corresponding circuitcorresponding circuit Gives more bandwidth to Gives more bandwidth to

the next highest priority the next highest priority transfer sharing (part of) transfer sharing (part of) the paththe path

(3) NMA senses increased (3) NMA senses increased throughput ; Notify DTSS (4)throughput ; Notify DTSS (4)

NIC Problem detected Problem resolved, sensed

by network monitoring

Back to nominal BW

BW used by end-system (source)

(2) Adjust

EHAEHA

End-system (Source)

End-system (Target)

EHAEHA

Data transfer

NMANMA

(1) M

onito

ring

(4) Notify

(1) Notify(3) M

onitorin

g

Network AND End-host monitoring allow Root Cause Analysis!

NMA: Network Monitoring AgentEHA: End-Host AgentDTSS: Dataset Transfer & Scheduling Service



End Host Agent (EHA)

EHA implements secure APIs that applications can use for requests and interactions with network services.

EHA continuously receives estimations of time-to-complete its requests Where possible, EHA will help provide correct system & network configuration

to the end-hosts.

Minimize!

CPUCPU

NetNet

DiskDisk

SystemSystem

MemMem

Authorization; Service discovery; System configurationComplete local profiling of host hardware and softwareComplete monitor of host processes, activity, load End to end performance measurementsAct as active listener to “events” generated by local apps

NetworkNetwork Disk I/ODisk I/O

End Host/User Agent is a Reality in EVO (also FDT, VINCI, etc.)

Enable (suggested) automatic actions based on anomalies to ensure proper quality (e.g.: automatically limit video

if bandwidth is limited)

EHA monitors various system &

network parameters Key Characteristic

A coherent real-time architecture of

inter-communicating agents and services

with correlated actions

Specialized Agents Supporting Robust Data Transfer Operations

Example Agent Services Supporting Transfer Operations & Management Build a real-time, dynamic map of the inter-connection topology among

the services, & the channels on each link-segment, including: How the channels are allocated (bandwidth, priority) and used Information on the total traffic and major flows on each segment

Locate network resources & services which have agreements with a given VO

List routers and switches along each path that are “manageable” Determine which path-options exist between source and destination Among N replicas of a data source, and M possible network paths,

“Discover” (with monitoring) the least estimated time to completion of a transfer to a given destination; take reliability-record into account

Detect problems in real-time such as : Loss or impairment of connectivity; crc errors; asymmetric routing Large rate of lost packets, or retransmissions (above a threshold)

The Scheduler

Independent agents control different segments or channels Negotiate for end-to-end connection, using cost functions to find

“lowest-cost” path (on the topological graph): based on VO, priorities, pre-reservations, etc.

An agent offers a “lease” for use of a segment or channel to its peers Periodic lease renewals are used by all agents

Allows a flexible response of the agents to task completion, application failure or network errors.

If a segment failure or impairment is detected, DTSS automatically provides path-restoration based on the discovered network topology Resource reallocation may be applied to preserve high priority tasks The supervising agents then release all other channels or segments

along a path:An alternative path may be set up: rapidly enough to avoid a TCP

timeout, and thus to allow the transfer to continue uninterrupted.

b Implemented as a set of collaborating agents: No single point of failure

Channel allocation based on VO/Priority, [ + Wait time, etc.]Create on demand a End-to-end path or Channel & configure end-hosts Automatic recovery (rerouting) in case of errors Dynamic reallocation of throughputs per channel: to manage priorities,

control time to completion, where neededReallocate resources requested but not used

Dynamic Path Provisioning & Queueing

User Scheduling

ControlMonitoring

End Host Agents

RealtimeFeedback

Request

Dynamic end-2-end optical path creation & usage

Minimize emails like this!

FDT Transfer

4 fiber cut emulations

200+ MBytes/secFrom a 1U Node

FDT Disk-to-Disk WAN transfersFDT Disk-to-Disk WAN transfersBetween Two End-hostsBetween Two End-hosts

NEW YORK GENEVA

Reads and writes on 4 SATA disks in parallel on each server

Mean traffic ~ 210 MB/s~ 0.75 TB per hour

MB

/s

CERN CALTECH

Reads and writes on two 12-port RAID Controllers in parallel on each server

Mean traffic ~ 545 MB/s~ 2 TB per hour

1U Nodes with 4 Disks 4U Disk Servers with 24 Disks

Working on integrating FDT with DCache

RSS feeds and email alerts when anomalies

occur whose correction is not automated

Incrementally try to automate these anomaly-corrections (if possible)based on accumulated experience

Expose system information that cannot be handled automatically

Drill-down, with simpleauto-diagnostics, using

ML/APMon in ALIEN

System Architecture: the Main Services

Application

End UserAgent

Topology Discovery

GMPLS MPLS OS SNMP

Dynamic Path & Channel Building, Allocation

Control Path Provisioning

Failure Detection

Application

End UserAgent

Authentication, Authorization, Accounting

Learning

Prediction

System Evaluation & Optimization

MONITORING

Scheduling & Queues

Several services are already Several services are already operational and field testedoperational and field tested

System Architecture: Scenario View

Interoperation with Other Network Domains

Benefits to LHC

Higher performance transfers Better utilization of our network resources

Applications interface with network services & monitoring Instead of creation of their own, so

Less manpower required from CMS Decrease email traffic

Less time spent on support; Less manpower from CMS Providing end-to-end monitoring and alerts

More transparency for troubleshooting (if it is not automated) Incrementally work on automation of anomaly-handling where possible;

based on accumulated experience Synergy with US funded projects

Manpower is available to support the transition Distributed system, with pervasive monitoring and rapid reaction time

Decrease single points of failure Provide information and diagnosis that is not otherwise possible Several services are already operational and field tested

It is important is to establish a smooth transition from the current situation to the one

envisioned without interruption of currently

available services

We intend progressive developmentand deployment (not “all at once”).

But there are strong benefits in early deployment: cost, efficiency, readiness

Resources: For More Information

End Host Agent (LISA):

http://monalisa.caltech.edu/monalisa__Interactive_Clients__LISA.html Alice monitoring and alerts:

http://pcalimonitor.cern.ch/ Fast data transfer:

http://monalisa.cern.ch/FDT/ Network services:

http://monalisa.caltech.edu/monalisa__Service_Applications__Vinci.html EVO with End Host Agent:

http://evo.caltech.edu/ UltraLight: http://ultralight.org

See the various posters on this subject at CHEP

Extra Slides FollowExtra Slides Follow

US LHCNet, ESnet Plan 2007-2010:US LHCNet, ESnet Plan 2007-2010:30-80Gbps US-CERN, ESnet MAN30-80Gbps US-CERN, ESnet MAN

DENDEN

ELPELP

ALBALBATLATL

Metropolitan Area Rings

Aus.

Europe

SDGSDG

AsiaPacSEASEA

Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene

New ESnet hubsESnet hubs

SNVSNV

Europe

Japan

Science Data Network core, 40-60 Gbps circuit transportLab suppliedMajor international

Production IP ESnet core, 10 Gbps enterprise IP traffic

Japan

Aus.

Metro Rings

ESnet4SDN Core:

30-50G

ESnet IP Core≥10 Gbps

10Gb/s10Gb/s

30Gb/s2 x 10Gb/s

NYCNYCCHICHI

US-LHCNetNetwork Plan

(3 to 8 x 10 Gbps US-CERN)

LHCNet Data Network

DCDCGEANT2SURFNetIN2P3

NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2

CERN

FNAL

BNL

US-LHCNet: Wavelength Quadrangle

NY-CHI-GVA-AMS 2007-10: 30, 40, 60, 80

Gbps

ESNet MANs to FNAL & BNL; Dark Fiber

to FNAL; Peering With GEANT

FDT: Fast Data Transport SC06 Results 11/14 – 11/15/06

Stable disk-to-disk flows Tampa-Caltech:

Stepping up to 10-to-10 and 8-to-8 1U Server-pairs 9 + 7 = 16 Gbps; then Solid overnight. Using One 10G link

Efficient Data TransfersReading and writing at disk

speed over WANs (with TCP) for the first time

SC06 Results: 17.7 Gbps on one link; 8.6 Gbps to/from Korea

In Java: Highly portable, runs on all major platforms.

Based on an asynchronous, multithreaded system

Streams a dataset (list of files) continuously, from a managed pool of buffers in kernel space, through an open TCP socketSmooth data flow from each

disk to/from the networkNo protocol start-phase

between files New Capability Level: ~70 Gbps New Capability Level: ~70 Gbps per rack of low cost 1U serversper rack of low cost 1U servers I. Legrand

Dynamic Network Path Allocation & Automated Dataset Transfer

Dynamic Network Path Allocation & Automated Dataset Transfer

Internet

A

>mlcopy A/fileX B/path/

OS path availableConfiguring interfacesStarting Data Transfer

Mo

nito

rC

on

trol

TL

1

Optical Switch

MonALISAService

MonALISA Distributed Service System

BOSAgent

Active light path

Regul

ar IP

pat

hReal time monitoring

APPLICATION

LISA AGENTsets up

- Net Interfaces - TCP stack - Kernel - RoutesLISA APPLICATION“use eth1.2, …”

LISALISA AgentAgent

DATA

Detects errors and automatically Detects errors and automatically recreates the path in less than recreates the path in less than the TCP timeout (< 1 second)the TCP timeout (< 1 second)

The Functionality of the VINCI System

Layer 3

Layer 2

Layer 1

Site A Site B Site C

MonALISA

ML AgentML Agent

MonALISA

ML AgentML Agent

MonALISA

ML AgentML Agent

ML proxy servicesML proxy services

Agent

Agent

Agent

Agent

ROUTERS

ETHERNETLAN-PHYor WAN-PHY

DWDMFIBER

Agent

Monitoring Network Topology, Monitoring Network Topology, Latency, RoutersLatency, Routers

NETWORKS

AS

ROUTERS

Real Time Topology Discovery & DisplayReal Time Topology Discovery & Display

Four Continent TestbedFour Continent Testbed

Building a global, network-aware end-to-end managed real-time Grid

Scenario : pre-Scenario : pre-emption of emption of

lower priority lower priority transfertransfer

(1) At time t(1) At time t00, a high priority request , a high priority request (e.g. T0 – T1 transfer) is received (e.g. T0 – T1 transfer) is received by DTSSby DTSS

(2) DTSS(2) DTSS squeezes the normal priority squeezes the normal priority

channelchannel allocates necessary allocates necessary

bandwidth to high priority bandwidth to high priority transfertransfer

(3) DTSS notifies EHAs(3) DTSS notifies EHAsAt the end of the high priority At the end of the high priority

transfer (ttransfer (t11), DTSS:), DTSS: restores original allocated restores original allocated

bandwidth and notifies EHAsbandwidth and notifies EHAs

DTTS: Data Transfer & Scheduling ServiceNMA: Network Monitoring AgentEHA: End-Host Agent (2

)Circ

uit S

etup

EHAEHA

End-system 1 (Source 1)

End-system 3 (Target)

EHAEHA

Data Transfer

(3) No

tify(3)Notify

New Allocated bandwidth,High priority

Allocated bandwidth,Normal priority

Allocated bandwidth,High priority

t0 t1

EHAEHA

DTSSDTSSEnd-system 2

(Source 2)(1) Request

Advanced examplesAdvanced examples

Reservation cancellation by userReservation cancellation by user Transfer request cancellation has to be propagated to the DTSSTransfer request cancellation has to be propagated to the DTSS

Not enough to cancel the job on end-hostNot enough to cancel the job on end-host

DTSS will schedule next transfer in queue ahead of timeDTSS will schedule next transfer in queue ahead of time Application might not have data ready (e.g. staging from tape)Application might not have data ready (e.g. staging from tape) DTSS has to communicate with EHAs which could profit from early schedule, DTSS has to communicate with EHAs which could profit from early schedule,

end host has to advertise when it can accept the circuitend host has to advertise when it can accept the circuit

Reservation extension (transfer not finished on schedule)Reservation extension (transfer not finished on schedule) If possible, DTSS keeps the circuitIf possible, DTSS keeps the circuit

Same bandwidth if availableSame bandwidth if availableTake transfer priority into accountTake transfer priority into accountPossibly with quota limitationPossibly with quota limitation

Progress tracking: Progress tracking: EHA has to announce remaining transfer sizeEHA has to announce remaining transfer sizeRecalculate Estimated Time to Completion taking into account new Recalculate Estimated Time to Completion taking into account new

bandwidthbandwidth

Application in 2008 (Pseudo-log)

Node1> ts –v –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum

-TS: Initiating file transfer setup… -TS: Contact path discovery service (PDS) through EHA for transfer request -PDS: Path discovery in progress… -PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE -PDS: Path options found: -PDS: Lightpath option exists end-to-end -PDS: Virtual pipe option exists (partial) -PDS: High-performance protocol capable end-systems exist -TS: Request 1.2 TB file transfer within 2 hours 50 minutes, priority 3 -TS: Target host confirms available space for [email protected] -TS: EHA contacted…parameters transferred -EHA: Priority 3 request allowed for [email protected] -EHA: request scheduling details -EHA: Lightpath prior scheduling (higher/same priority) precludes use -EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes -EHA: request monitoring prediction along path -EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over

next 2 hours 50 minutes

mailto:DN%[email protected]




Hybrid Nets with Circuit-Oriented Services: Hybrid Nets with Circuit-Oriented Services: Lambda Station ExampleLambda Station Example

A network path forwarding service to interface production A network path forwarding service to interface production facilities with advanced research networks:facilities with advanced research networks:

Goal is Goal is selective forwarding on a selective forwarding on a per flowper flow basis basis& Alternate network paths for high impact data movement & Alternate network paths for high impact data movement

Dynamic path modification, with graceful cutover & fallbackDynamic path modification, with graceful cutover & fallback

Lambda Station Lambda Station interacts with:interacts with: Host applicationsHost applications

& systems & systems LAN infrastructureLAN infrastructure Site border infrastructureSite border infrastructure Advanced technology WANsAdvanced technology WANs Remote Lambda StationsRemote Lambda Stations

D. Petravick, P. DeMar

Also OSCARs, Terapaths, UltraLight

Investigate:Investigate: Integration & Use of Integration & Use of LAN QoS and MPLS-BasedLAN QoS and MPLS-BasedDifferentiated Network ServicesDifferentiated Network Services in the ATLAS Distributed in the ATLAS Distributed Computing EnvironmentComputing Environment As a Way to As a Way to Manage the Network As a Critical ResourceManage the Network As a Critical Resource

TeraPaths: BNL and Michigan; Partnering with TeraPaths: BNL and Michigan; Partnering with OSCARS (ESnet), LStation (FNAL) and DWMI (SLAC)OSCARS (ESnet), LStation (FNAL) and DWMI (SLAC)

MPLS and Remote LAN QoS requests

GridFtp & SRM

LAN/MPLSESnet

TeraPaths

Resource

Manager

Traffic Identification:addresses, port #,

DSCP bits

Grid AA

Network Usage Policy

Data ManagementData Transfer

Bandwidth

Requests &

Releases

OSCARS

IN(E)GRESS

Monitoring

SE

LA

N Q

oS

LA

N Q

oS

M10

LA

N/M

PLS

Remote TeraPaths

Dantong Yu

Application in 2008 Node1> ts –v –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum

-TS: Initiating file transfer setup… -TS: Contact path discovery service (PDS) through EHA for transfer request -PDS: Path discovery in progress… -PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE -PDS: Path options found: -PDS: Lightpath option exists end-to-end -PDS: Virtual pipe option exists (partial) -PDS: High-performance protocol capable end-systems exist -TS: Request 1.2 TB file transfer within 2 hours 50 minutes, priority 3 -TS: Target host confirms available space for [email protected] -TS: EHA contacted…parameters transferred -EHA: Priority 3 request allowed for [email protected] -EHA: request scheduling details -EHA: Lightpath prior scheduling (higher/same priority) precludes use -EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes -EHA: request monitoring prediction along path -EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over

next 2 hours 50 minutes



Network Control &Configuration

Topological Graph and traffic information

Applications (Data Transfers, Storage) Monitoring

Scheduling End Hosts Control

& Configuration

Reservations & Active Transfers lists

User

Progress Monitor

ApMon, dedicated modulesSNMP, sFlow, TL1, trace, pingAvBw, /proc , TCP conf.

DedicatedAgents

Request

Options

Decision

The main components for Scheduler

Priority Queues for each segment

Possible path / channels options for the requested priority

Network & end hostsmonitoring

Failures Detection

Learning and Prediction

We must also consider the applications that do not use our APIs and to provide effective network services for all the applications.

Learning algorithms (like Self Organizing Neural Networks) will be used to evaluate the traffic created by other applications, to identify major patters and to dynamically setup effective connectivity maps based on the monitoring data.

It is very difficult if not impossible to assume that we could predict all possible events in a complex environment like the GRID, and compile the knowledge about those events in advance.

Learning is the only practical approach in which agents can acquire necessary information to describe their environments.

We approach this multi-agent learning task from two different levels: the local level of individual learning agents the global level of inter-agent communication

We need to ensure that each agent can be optimized from local perspective, while the globally monitoring mechanism acts as a ‘driving force’ to evolve the agents collectively based on the previous experience .

Documents

Advances in Integrated Storage, Transfer and Network Management Email: @ On behalf of the Ultralight Collaboration