Tal Lavian [email protected] Advanced Technology Research, Nortel Networks Pile of selected Slides August 18th, 2005

Tal [email protected] www.nortel.com/dracAdvanced Technology Research , Nortel Networks

Pile of selected Slides

August 18th, 2005

mailto:[email protected]

http://www.nortel.com/drac

2

Optical Networks Change the Current Pyramid

George Stix, Scientific American,

January 2001

x10

DWDM- fundamental miss-balance between computation and communication

3

New Networking Paradigm

> Great vision – • LambdaGrid is one step towards this concepts

> LambdaGrid – • A novel service architecture • Lambda as a Scheduled Service • Lambda as a prime resource - like storage and computation• Change our current systems assumptions• Potentially opens new horizon

“A global economy designed to waste transistors, power, and

silicon area -and conserve bandwidth above all- is breaking apart and

reorganizing itself to waste bandwidth and conserve power, silicon area, and transistors.“ George Gilder Telecosm (2000)

☺

4

The “Network” is a Prime Resource for Large- Scale Distributed System

Integrated SW System Provide the “Glue”Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to

be scheduled, to be managed and coordinated to support collaborative operations

Instrumentation

Person

Storage

Visualization

Network

Computation

5

From Super-computer to Super-network>In the past, computer processors were the fastest

part• peripheral bottlenecks

>In the future optical networks will be the fastest part• Computer, processor, storage, visualization, and

instrumentation - slower "peripherals”

> eScience Cyber-infrastructure focuses on computation, storage, data, analysis, Work Flow. • The network is vital for better eScience

> How can we improve the way of doing eScience?

6

Layered Architecture

La

mb

da

Da

ta

Gri

d -

G

lob

us

S

erv

ice

s

CONNECTION

Fabric

SABUL

UDP

ODIN

OMNInetStorage Bricks

Grid FTP

GRAM

GSI

Particle Physics

Multidisciplinary Simulation

SOAP

TCP/HTTP

Grid Layered Architecture

NRS

Storage Service

DTS

IP

Connectivity

Application

Resource

CollaborativeGARA

7

Problem Solving Environment

Applications and Supporting Tools

Application Development Support

Common Grid

Services

LocalResources

Gri

d

Info

rmat

ion

S

ervi

ce

Un

ifo

rmR

eso

urc

eA

cces

s

Bro

keri

ng

Glo

bal

Q

ueu

ing

Glo

bal

Eve

nt

Ser

vice

s

Co

-S

ched

uli

ng

Dat

a C

atal

og

uin

g

Un

ifo

rm D

ata

Acc

ess

Co

mm

un

icat

ion

Se

rvic

es

Au

tho

riza

tio

n

Grid Security Infrastructure (authentication, proxy, secure transport)

Au

dit

ing

Fau

lt

Man

ag

emen

t

Mo

nit

ori

ng

Communication

ResourceManager

CPUs

ResourceManager

Tertiary Storage

ResourceManager

On-Line Storage

ResourceManager

Scientific Instruments

ResourceManager

Monitors

ResourceManager

Highspeed Data

Transport

ResourceManagernet QoS

layers of increasing abstraction taxonomy

Grid access (proxy authentication, authorization, initiation)

Grid task initiation

Collective Grid

Services

Fabric

Dat

a R

epli

cati

on

High performance computing and Processor memory co-allocationSecurity and Generic AAAOptical NetworkingResearched in other programlinesImported from the Globus toolkit

Source: Case De Latt

8

Problem Solving Environment

Applications and Supporting Tools

Application Development Support

Common Grid

Services

LocalResources

Gri

d

Info

rmat

ion

S

ervi

ce

Un

ifo

rmR

eso

urc

eA

cces

s

Bro

keri

ng

Glo

bal

Q

ueu

ing

Glo

bal

Eve

nt

Ser

vice

s

Co

-S

ched

uli

ng

Dat

a C

atal

og

uin

g

Un

ifo

rm D

ata

Acc

ess

Co

mm

un

icat

ion

Se

rvic

es

Au

tho

riza

tio

n

Grid Security Infrastructure (authentication, proxy, secure transport)

Au

dit

ing

Fau

lt

Man

ag

emen

t

Mo

nit

ori

ng

Communication

ResourceManager

CPUs

ResourceManager

Tertiary Storage

ResourceManager

On-Line Storage

ResourceManager

Scientific Instruments

ResourceManager

Monitors

ResourceManager

Highspeed Data

Transport

ResourceManagernet QoS

Grid access (proxy authentication, authorization, initiation)

Grid task initiation

Collective Grid

Services

Fabric

Dat

a R

epli

cati

on

High performance computing and Processor memory co-allocationSecurity and Generic AAAOptical NetworkingResearched in other programlinesImported from the Globus toolkit

9

Data Schlepping ScenarioMouse Operation

> The “BIRN Workflow” requires moving massive amounts of data:• The simplest service, just copy from remote DB to local storage in mega-compute site• Copy multi-Terabytes (10-100TB) data • Store first, compute later, not real time, batch model

Mouse network limitations: • Needs to copy ahead of time• L3 networks can’t handle these amounts effectively, predictably, in a short time window • L3 network provides full connectivity -- major bottleneck• Apps optimized to conserve bandwidth and waste storage • Network does not fit the “BIRN Workflow” architecture

10

Limitations of Solutions with Current Network Technology

>The BIRN networking is unpredictable, a major bottleneck, specifically over WAN, limit the type, way, data sizes of the biomedical research, prevents true Grid Virtual Organization (VO) research collaborations

>The network model doesn’t fit the “BIRN Workflow” model, it is not an integral resource of the BIRN Cyber-Infrastructure

11

Philosophy of Web Service Grids> Much of Distributed Computing was built by natural extensions of computing

models developed for sequential machines

> This leads to the distributed object (DO) model represented by Java and CORBA• RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java

> Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly• Distributed Objects Replaced by Services

> Note CORBA was considered too complicated in both organization and proposed infrastructure• and Java was considered as “tightly coupled to Sun”• So there were other reasons to discard

> Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages

12

Web services> Web Services build loosely-

coupled, distributed applications, based on the SOA principles.

> Web Services interact by exchanging messages in SOAP format

> The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

Databases

Humans

ProgramsComputational resources

Devices

reso

urce

s

BP

EL,

Jav

a, .N

ET

serv

ice

logi

c

<env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body></env:Envelope> m

essa

ge p

roce

ssin

g

SO

AP

and

WS

DL

SOAP messages

13

What is a Grid?> You won’t find a clear description of what is Grid and how does differ from a collection of Web Services• I see no essential reason that Grid Services have different requirements than Web

Services• Geoffrey Fox, David Walker, e-Science Gap Analysis, June 30 2003. Report UKeS-

2003-01, http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html.

• Notice “service-building model” is like programming language – very personal!

> Grids were once defined as “Internet Scale Distributed Computing” but this isn’t good as Grids depend as much if not more on data as well as simulations

> So Grids can be termed “Internet Scale Distributed Services” and represent a way of collecting services together to solve problems where special features and quality of service needed.

http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html

14

e-Infrastructure

> e-Infrastructure builds on the inevitable increasing performance of networks and computers linking them together to support new flexible linkages between computers, data systems and people• Grids and peer-to-peer networks are the technologies that build e-

Infrastructure• e-Infrastructure called CyberInfrastructure in USA

> We imagine a sea of conventional local or global connections supported by the “ordinary Internet”• Phones, web page accesses, plane trips, hallway conversations• Conventional Internet technology manages billions of broadcast or low

(one client to Server) or broadcast links> On this we superimpose high value multi-way organizations (linkages)

supported by Grids with optimized resources and system support• Low multiplicity fully interactive real-time sessions• Resources such as databases supporting (larger) communities

15

Architecture of (Web Service) Grids> Grids built from Web Services communicating through an

overlay network built in SOFTWARE on the “ordinary internet” at the application level

> Grids provide the special quality of service (security, performance, fault-tolerance) and customized services needed for “distributed complex enterprises”

> We need to work with Web Service community as they debate the 60 or so proposed Web Service specifications• Use Web Service Interoperability WS-I as “best practice”• Must add further specifications to support high performance• Database “Grid Services” for N plus N case• Streaming support for M2 case

16

Importance of SOAP> SOAP defines a very obvious message structure with a header

and a body

> The header contains information used by the “Internet operating system”• Destination, Source, Routing, Context, Sequence Number …

> The message body is only used by the application and will never be looked at by “operating system” except to encrypt, compress it etc.

> Much discussion in field revolves around what is in header!• e.g. WSRF adds a lot to header

17

Problem Statement

>Problems• BIRN Mouse often:

• requires interaction and cooperation of resources that are distributed over many heterogeneous systems at many locations;

• requires analyses of large amount of data (order of Terabytes);

• requires the transport of large scale data;• requires sharing of data;• requires to support workflow cooperation model

Q? Do we need a new network abstraction?

18

BIRN Network Limitations> Optimized to conserve bandwidth and waste storage

• Geographically dispersed data • Data can scale up 10-100 times easily

> L3 networks can’t handle multi-terabytes efficiently and cost effectively

> Network does not fit the “BIRN Workflow” architecture• Collaboration and information sharing is hard

> Mega-computation, not possible to move the computation to the data (instead data to the computation site)

> Not interactive research, must first copy then analyze• Analysis locally, but with strong limitations geographically• Don’t know a head of time where the data is

• Can’t navigate the data interactively or in real time• Can’t “Webify” the information of large volumes

> No cooperation/interaction between the storage and network middleware(s)

19

Our proposed Solution

>Switching technology: • Lambda switching for data-intensive transfer

>New abstraction: • Network Resource encapsulated as a Grid service

>New middleware service architecture: LambdaGrid service architecture

20

Our proposed Solution

> We are proposing LambdaGrid Service architecture that interacts with BIRN Cyber-infrastructure, and overcome BIRN data limitations efficiently & effectively by:

• treating the “network” as a primary resource just like “storage” and “computation”• treat the “network” as a “scheduled resource”• rely upon a massive, dynamic transport infrastructure: Dynamic Optical Network

21

Goals of Our Investigation> Explore a new type of infrastructure which manages codependent storage and network

resources

> Explore dynamic wavelength switching, based on new optical technologies

> Explore protocols for managing dynamically provisioned wavelengths

> Encapsulate “optical network resources” into the Grid services framework to support dynamically provisioned, data-intensive transport services

> Explicit representation of future scheduling in the data and network resource management model

> Support a new set of application services that can intelligently schedule/re-schedule/co-schedule resources

> Provide for large scale data transfer among multiple geographically distributed data locations, interconnected by paths with different attributes.

> To provide inexpensive access to advanced computation capabilities and extremely large data sets

22

Example: Lightpath Scheduling

> Request for 1/2 hour between 4:00 and 5:30 on Segment D granted to User W at 4:00

> New request from User X for same segment for 1 hour between 3:30 and 5:00

> Reschedule user W to 4:30; user X to 3:30. Everyone is happy.

Route allocated for a time slot; new request comes in; 1st route can be rescheduled for a later slot within window to accommodate new request

4:30 5:00 5:304:003:30

W

4:30 5:00 5:304:003:30

X

4:30 5:00 5:304:003:30

WX

☺

23

Scheduling Example - Reroute

> Request for 1 hour between nodes A and B between 7:00 and 8:30 is granted using Segment X (and other segments) is granted for 7:00

> New request for 2 hours between nodes C and D between 7:00 and 9:30 This route needs to use Segment E to be satisfied

> Reroute the first request to take another path thru the topology to free up Segment E for the 2nd request. Everyone is happy

A

D

B

C

X7:00-8:00

A

D

B

C

X7:00-8:00

Y

Route allocated; new request comes in for a segment in use; 1st route can be altered to use different path to allow 2nd to also be serviced in its time window

☺

24

Generalization and Future Direction for Research> Need to develop and build services on top of the base encapsulation

> LambdaGrid concept can be generalized to other eScience apps which will enable new way of doing scientific research where bandwidth is “infinite”

> The new concept of network as a scheduled grid service presents new and exciting problems for investigation:• New software systems that is optimized to waste bandwidth

• Network, protocols, algorithms, software, architectures, systems

• Lambda Distributed File System• The network as a Large Scale Distributed Computing • Resource co/allocation and optimization with storage and computation• Grid system architecture • enables new horizon for network optimization and lambda scheduling• The network as a white box, Optimal scheduling and algorithms

25

Goals

> An application drives shares of network resources• Resources = {bandwidth, security, acceleration, sensors, …}• Within policy-allowed envelopes, end-to-end• No more point-and-click interfaces, no operators’ involved, etc.

> Dynamic provisioning of such resources• JIT, TOD, schedulable• Create alternatives to bandwidth peak-provisioning across MAN/WAN• With a continuum of satisfaction vs. utilization fruition points

> Tame and exploit network diversity• Heterogeneous and independently managed network clouds, end-to-

end• Integrated Packet-Optical to best accommodate known traffic patterns

> Network as a 1st class resource in Grid-like constructs• Joins CPU, DATA resources

Improve Network Experience and Optimize CapEx/OpEx

26

What used to be a bit-blasting race

has now become a complex mix of:Bit-blasting+ Finesse (granularity of control)+ Virtualization (access to diverse knobs)+ Resource bundling (network AND …)+ Security (AAA to start)+ Standards for all of the above

= [it’s a lot!]= [it’s a lot!]

27

Enabling new degrees of App/Net coupling

> Optical Packet Hybrid• Steer the herd of elephants to ephemeral optical circuits (few to few)• Mice or individual elephants go through packet technologies (many to many)• Either application-driven or network-sensed; hands-free in either case• Other impedance mismatches being explored (e.g., wireless)

> Application-engaged networks• The application makes itself known to the network• The network recognizes its footprints (via tokens, deep packet inspection)• E.g., storage management applications

> Workflow-engaged networks• Through workflow languages, the network is privy to the overall “flight-plan”• Failure-handling is cognizant of the same• Network services can anticipate the next step, or what-if’s• E.g., healthcare workflows over a distributed hospital enterprise

DRAC - Dynamic Resource Allocation Controller

28

Teamwork

Admin.

Application

connectivity plane

virtualization plane

dynamic provisioning plane

Alert, Adapt,Route, Accelerate

Detect

supplyevents

eventssupply

AgileNetwork(s)

Application(s)

AAA

NE

from/to peering DRACs

demand

Negotiate

DRAC, portable SW

29

P-CSCFPhys. PCSCF

Session Convergence &

NexusEstablishment

End-to-endPolicy

DRAC Built-inServices

(sampler)

WorkflowLanguage

3rd PartyServices

AAA

Access

Value-AddServices

Sources/Sinks

Topology

Metro

Core

Proxy Proxy ProxyProxyProxy

P-CSCFPhys. P-CSCF

Proxy

Grid CommunityScheduler

•smart bandwidth management •Layer x <-> L1 interworking

•Alternate Site Failover

•SLA Monitoring and Verification •Service Discovery

•Workflow Language Interpreter

Bird’s eye View of the Service Stack

</DRAC>

<DRAC>

LegacySessions

(Management & Control Planes)

ControlPlane A

ControlPlane B

OAMOAMOAMPOAMOAMOAMPOAMOAMOAMPOAMOAM

OAMOAMOAMOAM

OAMOAM

30

Application Application

Services Services Services

SC2004 CONTROL CHALLENGE

data

control

data

control

Chicago Amsterdam

• finesse the control of bandwidth across multiple domains

• while exploiting scalability and intra- , inter-domain fault recovery

• thru layering of a novel SOA upon legacy control planes and NEs

AAA

DRAC DRACDRAC

AAA AAA AAA

DRAC*

OMNInetOMNInetODIN Starligh

t

Starlight

Netherlight

Netherlight UvAUvA

* Dynamic Resource Allocation Controller

ASTNASTNSNMPSNMP

31

Grid Network Serviceswww.nortel.com/drac

Internet (Slow) Internet (Slow)

Fiber (FA$T)Fiber (FA$T)

Grid Network Serviceswww.nortel.com/drac

GT4GT4

GT4

GT4

GT4

GT4

GW05 Floor

AA

AAAA

BB BB

BB

PP8600

PP8600

PP8600

OM3500 OM3500

32

> Several application realms call for hi-touch user/network experiences• Away from point-and-click or 1-800 cumbersome network provisioning• Away from ad-hoc application/network interfaces, and contaminations thereof• Away from gratuitous discontinuities as we cross LAN/MAN/WAN• E.g., Point in time Storage (bandwidth-intensive)• E.g., PACS Retrievals (latency-critical and bandwidth-intensive)• E.g., Industrial design, bio-tech as in Grids (bandwidth- and computation-

intensive)

> DRAC is network middleware of broad applicability• It can run on 3rd party systems or L2-L7 switches• Its core is upward decoupled from the driving application, and downward

decoupled from underlying connectivity (e.g., ASTN vs RPR vs IP/Eth)• It can be extended by 3rd party• It can be applied to workflows and other processes on fixed or wireless

networks

> DRAC fosters managed service bundles and fully aligns with Grid challenges

• It equips Globus/OGSA with a powerful network prong

Conclusion

Back-up

34

e-Science example “Mouse BIRN”

Source: Tal Lavian

(UCB, Nortel Labs)

Application Scenario Current MOP Network Issues

Pt – Pt Data Transfer of Multi-TB Data Sets

Copy from remote DB: Takes ~10 days (unpredictable)Store then copy/analyze

Want << 1 day<< 1 hour, innovation for new bio-scienceArchitecture forced to optimize BW utilization at cost of storage

Access multiple remote DB N* Previous Scenario Simultaneous connectivity to multiple sitesMulti-domainDynamic connectivity hard to manageDon’t know next connection needs

Remote instrument access (Radio-telescope)

Cant be done from home research institute

Need fat unidirectional pipesTight QoS requirements (jitter, delay, data loss)

Other Observations:• Not Feasible To Port Computation to Data• Delays Preclude Interactive Research: Copy, Then Analyze• Uncertain Transport Times Force A Sequential Process – Schedule Processing After Data Has Arrived• No cooperation/interaction among Storage, Computation & Network Middlewares•Dynamic network allocation as part of Grid Workflow, allows for new scientific experiments that are not possible with today’s static allocation

35

Bandwidth is nothingwithout control

The Wright Brothers had plenty thrust. Their problem was how to control it

Likewise, we have plenty bandwidth. We must now devise ways to harness it, and expose proper value props, where and when they are needed

36

Why are we here today?

New protocols for today’s realities

37

Big Disk Availability> What is the implication of having very large disk?

> What type of usage we will do with abounded disks?

> Currently disk cost is about $700/TB

> What type of applications can we use it?

> Video files – movie 1GB, - 70 cents store, in 5 years – 0.3 cents• How this change the use of personal storage? • type of new things we will store if the disk is so inexpensive?

38

Bandwidth is Becoming Commodity

> Price per bit went down by 99% in the last 5 years on the optical side

• This is one of the problems of the current telecom market

> Optical Metro – cheap high bandwidth access

• $1000 a month for 100FX (in major cities)

• This is less than the cost of T1 several years ago

> Optical Long-Haul and Metro access - change of the price point

• Reasonable price drive more users (non residential)

39

Fiber

BundleSingle

Fiber

Technology Composition • L3 routing – drop packets as a mechanism

• (10-3 lose look good)

• Circuit switching – set the link a head of time

• Optical networking – bit transmission reliability

• (error 10-9 -10-12)

• L3 delay – almost no delay in the optical layers

• Routing protocols are slow – Optics in 50ms

• Failure mechanism redundancy

• DWDM s tradeoff- higher bandwidth vs. more s

• For agile L1-L2 routing may need to compromise on bandwidth

• RPR – break L3 geographical subnetting

• Dumb Network - Smart Edge? Or opposite?

Fiber

DataNetworking

OpticalTransport

OpticalSwitching

Overlay Networks

SONET

OC-3 / 12 / 48 / 192

STS-NTDM

STS-NcEthernet

VT’sVT’sVT’s

VT’s

80M1M

1000M10M

300M500M

40

• An economical, scalable, high-quality network has to be a dumb network.

• Deployment of a dumb network is overdue.

41

Underlay Optical Networks> Problem

Radical mismatch between the optical transmission world and the electrical forwarding/routing world. Currently, a single strand of optical fiber can transmit more bandwidth than the entire Internet core.

Mismatch between L3 core capabilities and disk cost. With $2M disks (2PB) can fill the entire core internet for a year

> Approach • Service architecture interacts with the optical control, provides applications a

dedicated, on-demand, point-to-point optical link that is not on the public Internet

> Current Focus • Grid Computing, OGSA, MEMs, 10GE, Optical technologies• OmniNet testbed in Chicago, which will be connected to major national and

international optical networks

> Potential Impact Enabling technology for Data-Intensive applications (multi Terabytes)

42

Service Composition Current peering is mainly in L3. What can be done in L1-L2?

• The appearance of optical Access, Metro, and Regional networks

• L1-L2: Connectivity Service Composition

• Across administrative domains

• Across functionality domain (access, metro, regional, long-haul, under-see)

• Across boundaries (management, trust, security, control, technologies)

• Peering, Brokering, measurement, scalability

• Appearance of standards UNI – NNI

Access

Provider A

Provider B

Trust C

Metro

Technology G

Provider F

Control E

Regional

Admin L

Trust T

Security S

Long Haul

latency P

Bandwidth Q

Resiliency R

Client

Server

43

ASF withLeased pipes

Enterprise

• Very expensive and lengthy provisioning of leased pipes• Maintenance of many PHYs is cumbersome, doesn’t scale• A pipe is a single point of failure, there is no work-around

other than purchasing a companion leased pipe (and wasting good money on it)

HourlyBusiness

Downtime Costs

Brokerage operations

$6,450,000Credit card sales authorizations

$2,600,000Pay-Per-View TV

$150,000Home shopping TV

$113,000Catalog sales

$90,000Airline reservations

$90,000Tele-ticket sales

$69,000Package shipping

$28,000ATM fees

$14,500

Data Center A

Data Center B

Do outages cost money??

44

Summary> The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

> Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing

> Globus Toolkit a source of protocol and API definitions, reference implementations

> Current static communication. Next wave dynamic optical VPN

> Some relation to Sahara• Service composition: computation, servers, storage, disk, network…• Sharing, cooperating, peering, brokering…

45

Improvements in Large-Area Networks

> Network vs. computer performance• Computer speed doubles every 18 months• Network speed doubles every 9 months• Difference = order of magnitude per 5 years

> 1986 to 2000• Computers: x 500• Networks: x 340,000

> 2001 to 2010• Computers: x 60• Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

46

Log GrowthLog Growth

Processor

PerformanceW

AN/MAN B

andw

idth

100

10,000

1M

2x/9 mo

2x/18 mo

Gilder vs. Moore – Impact on the Future of Computing

10x

1995 1997 1999 2001 2003 2005 2007

10x every 5 years

47

Example: lambdaCAD(CO2 meets Grids)

visualizationcorridor

Interactive 3D visualization

0.8 TB

1.7 TB

1.1 TB

CO2

CO2 CO2

CO2

CO2

“LambdaCAD”—The 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the various terabyte-scale datasets are all guarantee to start flowing into the visualization corridor (B) at the same time. The

corridor only needs a modest circular buffer to stage incoming data awaiting processing

Site C

Headquarters

Site B

“LambdaCAD”—When the user starts a visualization session, the 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the various terabyte-scale datasets are all guarantee to start flowing into the visualization corridor (B) at the same time. The corridor now only needs a modest

circular buffer to stage incoming data awaiting processing (as opposed to requiring petabytes of local storage)

When the user starts a visualization session, the 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the large (TB+) databases can pump queried data (GB+) into the visualization

“crunch” corridor (C) at times t0, t1, tn based on data inter-dependencies. The corridor only needs a circular buffer to stage incoming data awaiting processing (as opposed to petabytes of local, possibly

outdated storage)

Contractor X

Site A

dynamically provisioned tributaries to site B

4.2 TB

48

End-to-end Nexus via DRAC

Applications

AAA

DRAC

Topology DB

Applications

AAA

Topology DB

Applications

AAA

Topology DB

Applications

AAA

Topology DB

Boundary Address list

Services Services Services

Customer A Customer B

Topology DB

: Synchronized Topology DB’s, populated by DRAC

: Authentication, Authorization, and AccountingAAA

DRAC DRACDRAC

NetworkNetwork NetworkNetworkRPRRPR ASTNASTN

Physical TopServices

49

What happened if:

5M Businesses Connected at 100 Mbps = 500Terabit/s

100M Consumers Connected at 5 Mbps = 500Terabit/s

Constant streaming ~ 260 ExaByte a Month

The current(4/2002) monthly data transfer on the core Internet is 40-70 TB

>

1 Petabit/s

Can the current Internet architecture handle growth of 5-6 orders of magnitude?

What are the technologies and the applications that might drive this growth?

51

Some key folks checking us out at our booth, GlobusWORLD ‘04, Jan ‘04

52

Application

Application

Services

Services

Services

SC2004 CONTROL CHALLENGE

data

control

data

control

Chicago Amsterdam

• finesse the control of bandwidth across multiple domains

• while exploiting scalability and intra- , inter-domain fault recovery

• thru layering of a novel SOA upon legacy control planes and NEs

AAA

DRAC DRACDRAC

AAA AAA AAA

DRAC

OMNInet

OMNInet

ODIN Starlight

Starlight

Netherlight

Netherlight UvAUvA

53

Fail over From Rout-D to Rout-A(SURFnet Amsterdam, Internet-2 NY, CANARIE Toronto, Starlight Chicago)

54

SDSS

Mouse Applications

Apps Middleware

Network(s)

Overall System

Lambda-Grid

Meta-Scheduler

Resource Managers

IVDSC

Control Plane

GT3

SRB

NRS

DTS

Data G

rid

Com

p Grid

Net G

rid

OGSI-fy

NMI

Our contribution

55

DTS - NRS

Data service

Scheduling logic

Replica service

NMI /IF

Apps mware I/F

Proposal evaluation

NRS I/F

GT3 /IF

Datat calc

DTS

Topology map

Scheduling algorithm

proposal constructor

NMI /IF

DTS IF

Scheduling service

Optical control I/F

proposal evaluator

GT3 /IF

Network allocation

Net calc

NRS

56

Layered Architecture

CONNECTION

Fabric

UDP

ODIN

Resources

Grid FTP

BIRN Mouse

Apps Middleware

TCP/HTTP

Grid Layered Architecture

Lambda Data Grid

IP

Connectivity

Application

Resource

CollaborativeBIRN

Workflow

NMI

NRS

BIRN Toolkit

Lambda

Resource managers

DB

Storage Computation

Optical Control

WSRF

Optical protocols

Optical hw

OGSA

OMNInet

Applications

Optical Path Control

DataTransfer Service

NetworkResource Service

Network Resource Grid ServiceExternal GridServices

Information S

ervice

Application MiddlewareLayer

ResourceMiddlewareLayer

Connectivityand

Fabric Layers

ReplicaService

WorkFlow

Service

DataHandlerServices

StorageResourceService

ProcessingResourceService

Low Level SwitchingServices

Generalized Architecture ☺

Data Transmission Plane

optical Control Plane

1 n

DB

1

n

1

n

Storage

Control Interactions

Optical Control Network


Network Service Plane

Data Grid Service Plane

NRS

DTS

Compute

NMI

Scientific workflow

Apps Middleware

Resource managers

59

60

From 100 Days to 100 Seconds

61

62

OMNI-View Lightpath map

63

NRS Interface and Functionality

// Bind to an NRS service:NRS = lookupNRS(address);//Request cost function evaluationrequest = {pathEndpointOneAddress, pathEndpointTwoAddress, duration, startAfterDate, endBeforeDate};ticket = NRS.requestReservation(request);// Inspect the ticket to determine success, and to findthe currently scheduled time:ticket.display();// The ticket may now be persisted and usedfrom another locationNRS.updateTicket(ticket);// Inspect the ticket to see if the reservation’s scheduled time has changed, or verify that the job completed, with any relevant status information:ticket.display();

64

Overheads - Amortization

Setup time = 48 sec, Bandwidth=920 Mbps

0%10%20%30%40%50%60%70%80%90%

100%

100 1000 10000 100000 1000000 10000000

File Size (MBytes)

Se

tup

tim

e /

To

tal T

ran

sfe

r T

ime

500GB

When dealing with data-intensive applications, overhead is

insignificant!

65

0

0.2

0.4

0.6

0.8

1 2 3 4 5 6

experiment number

bloc

king

pro

babi

lity

Simulation

Erlang B Model

Blocking Probability

Network Scheduling – Simulation Study

0

0.2

0.4

0.6

0.8

1 2 3 4 5 6

experiment numberbl

ocki

ng p

roba

bilit

y

0%

50%

100%

low er-bound

Blocking probabilityUnder-constrained requests

66

10/100/GE

10 GE

Lake Shore

Photonic Node

S. Federal

Photonic Node

W Taylor SheridanPhotonic

Node 10/100/GE

10/100/GE

10/100/GE

Optera5200

10Gb/sTSPR

Photonic Node

10 GE

PP

8600

Optera5200

10Gb/sTSPR

10 GE

Optera520010Gb/

sTSPR

Optera5200

10Gb/sTSPR

1310 nm 10 GbE

WAN PHY interfaces

10 GE

PP

8600

…

EVL/UICOM5200

LAC/UICOM5200

StarLightInterconnect

with otherresearchnetworks

10GE LAN PHY (Aug 04)

TECH/NUOM5200

10

Optera Metro 5200 OFA#5 – 24 km

#6 – 24 km

#2 – 10.3 km

#4 – 7.2 km

#9 – 5.3 km

5200 OFA

5200 OFA

Optera 5200 OFA

5200 OFA

OMNInet

• 8x8x8 Scalable photonic switch

• Trunk side – 10G DWDM• OFA on all trunks• ASTN control plane

GridClusters

Grid Storage

10

#8 – 6.7 km

PP

8600

PP

8600

2 x gigE

67

Initial Performance measure:End-to-End Transfer Time

0.5s 3.6s 0.5s 174s 0.3s 11s

OD

IN S

erve

r P

roce

ssin

g

File

tra

nsfe

r do

ne,

path

re

leas

ed

File

tra

nsfe

r re

ques

t ar

rives

Pat

h D

eallo

cati

on

req

ues

t

Dat

a T

ran

sfer

20 G

B

Pat

h ID

re

turn

ed

OD

IN S

erve

r P

roce

ssin

g

Pat

h A

lloca

tio

n

req

ues

t

25s

Net

wo

rk

reco

nfi

gu

rati

on

0.14sF

TP

set

up

ti

me

sumit

sumit9/20/2003Theme:Breakup of the end-to-end transfer time presented in the previous slide.Source: NWU

68

-30 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660

allocate path de-allocate path

#1 Transfer

Customer #1 Transaction Accumulation

#1 Transfer

Customer #2 Transaction Accumulation

Transaction Demonstration Time Line6 minute cycle time

time (sec)

#2 Transfer #2 Transfer



Network Service Request

Data Transmission Plane

OmniNet Control PlaneODIN

UNI-N

ODIN

UNI-N

Connection Control

L3 router

L2 switch

Data storageswitch

DataPath

Control

DataPath Control

DATA GRID SERVICE PLANEDATA GRID SERVICE PLANE

1 n

1

n

1

n

DataPath

DataCenter

ServiceControl

ServiceControl

NETWORK SERVICE PLANENETWORK SERVICE PLANE

GRID Service Request

DataCenter

DWDM-RAM Service Control Architecture

70

Radical mismatch: L1 – L3> Radical mismatch between the optical transmission world and the electrical

forwarding/routing world.

> Currently, a single strand of optical fiber can transmit more bandwidth than the entire Internet core

> Current L3 architecture can’t effectively transmit PetaBytes or 100s of TeraBytes

> Current L1-L0 limitations: Manual allocation, takes 6-12 months - Static. • Static means: not dynamic, no end-point connection, no service architecture, no glue

layers, no applications underlay routing

72

73

Path Allocation Overhead as a % of the Total Transfer Time

> Knee point shows the file size for which overhead is insignificant


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.1 1 10 100 1000 10000

File Size (MBytes)

Setu

p tim

e / To

tal Tr

ansfe

r Tim

e

1GB


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.1 1 10 100 1000 10000

File Size (MBytes)

Setup

time /

Total

Tran

sfer T

ime

5GB


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

100 1000 10000 100000 1000000 10000000

File Size (MBytes)

Setup

time /

Total

Tran

sfer T

ime

500GB

74

Packet Switched vs Lambda NetworkSetup time tradeoffs (Optical path setup time = 2 sec)

0.0

50.0

100.0

150.0

200.0

250.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Time (s)

Da

ta T

ran

sfe

rre

d (

MB

)

Packet sw itched (300 Mbps)

Lambda sw itched (500 Mbps)


Lambda sw itched (1 Gbps)

Lambda sw itched (10Gbps)

Packet Switched vs Lambda NetworkSetup time tradeoffs (Optical path setup time = 48 sec)

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

3500.0

4000.0

4500.0

5000.0

0.0 20.0 40.0 60.0 80.0 100.0 120.0

Time (s)

Da

ta T

ran

sfe

rre

d (

MB

)

Packet sw itched (300 Mbps)



Lambda sw itched (1 Gbps)

Lambda sw itched (10Gbps)

75

Network Resource Manager

Network Resource Manager

End-to-End-Oriented Allocation Interface

Using Application(DMS)

Omninet Network Manager (Odin)

Omninet Data Interpreter

Segment-Oriented Topology and Allocation Interface

Scheduling / Optimizing Application

Network-Specific Network Manager

Network-Specific Network Manager

Network-Specific Data Interpreter

Network-Specific Data Interpreter

Items in blue are planned

76

Data Management Service

Uses standard ftp (jakarta commons ftp client)

Implemented in Java

Uses OGSI calls to request network resources

Currently uses Java RMI for other remote interfaces

Uses NRM to allocate lambdas

Designed for future scheduling

λData Receiver Data Source

FTP client FTP server

DMS NRM

Client App

77

File transfer times

1

2

5

10

1

2

5

10

0

1

2

3

4

5

6

7

8

9

10

0 100 200 300 400 500 600 700 800 900 1000

Time (sec)

File

Siz

e (G

b) DWDM-RAM (overOMNINet)

FTP (over Internet)

sumit

sumit9/20/2003Theme:This slide plots the file transfer size vs the time taken for the transfer.Also the maximum achieved bandwidth is very prominently displayed. This can be calculated from the slope of the graph.Source: NWU

78

Fixed Bandwidth List Scheduling

79

This scenario shows three jobs being scheduled sequentially, A, B and C. Job A is initially scheduled to start at the beginning of its under-constrained window. Job B can start after A and still satisfy its limits. Job C is more constrained with its runtime window but is a smaller job. The scheduler adapts to this conflict by intelligently rescheduling each job so all constraints are met.

Job Job Run-time Window

A 1.5 hours 8am – 1pm

B 2 hours 8:30am – 12pm

C 1 hour 10am – 12pm

Fixed Bandwidth List Scheduling

80

Sample measurements

Timeline in ms

Disaster TriggerAnd processing

< 1 ms

0 2 14 1000 1200 13,000

Signal and

Response1.4 ms

Photonic MEMs

control12 ms

Ethernet Switch QoS control

1150 ms

Inter-process communicatio

n3 ms

NOT TO SCALE

> Measurements taken with clocks synchronized using NTP

> Layer 1 link setup and IP QoS reconfiguration took around 1.2 seconds

> The VLANs/Spanning Tree took an additional 12 seconds to converge

> Further work with larger networks needed

VLAN/Spanning Tree convergence

12 seconds

EvaQ8StartL1 L2

81

DARPA DANCE Demo (May 31st , ‘02)

EthernetSwitch

EthernetSwitch

MEMSswitch

EthernetSwitch

10 GE

SafeEnd

DisasterArea

ASTN Control Plane

L2-L7Switch

EvaQ8 Sw.

L2-L7Switch

EvaQ8 Sw.

L2-L7Switch

EvaQ8 Sw.

OG - 1

OG - 3

Disaster Event/Environ. Sensor

Control Mesg

100Mbps

OG - 2

> A notional view of an EvaQ8 end-to-end network

> Automatic optical path setup on disaster trigger

> Sample measurements

CrisisCenter

82

20GB File Transfer

83

Multi-domain provisioning

84

Integration of technologies

> Evaluating the implementation implications and testing a number or scenarios

Documents

Tal Lavian [email protected] Advanced Technology Research, Nortel Networks Pile of selected Slides August 18th, 2005