50
The Network The Network Management Problem Management Problem

The Network Management Problem. What Network operators must be able to do

Embed Size (px)

Citation preview

Page 1: The Network Management Problem. What Network operators must be able to do

The Network Management The Network Management ProblemProblem

Page 2: The Network Management Problem. What Network operators must be able to do

What Network operators must be What Network operators must be able to do able to do

Page 3: The Network Management Problem. What Network operators must be able to do

The requirement for network The requirement for network management management

Page 4: The Network Management Problem. What Network operators must be able to do

ProvisioningProvisioning Detecting faultsDetecting faults Checking (and verifying) performanceChecking (and verifying) performance Billing/accountingBilling/accounting Initiating repairs or network upgradesInitiating repairs or network upgrades Maintaining the network inventoryMaintaining the network inventory

Page 5: The Network Management Problem. What Network operators must be able to do

The issues are :The issues are :

Bringing the managed data to the Bringing the managed data to the codecode

ScalabilityScalability The shortage of development skills The shortage of development skills

for creating management systemsfor creating management systems The shortage of operational skills for The shortage of operational skills for

running networksrunning networks

Page 6: The Network Management Problem. What Network operators must be able to do

Bringing the Managed Data to Bringing the Managed Data to the Codethe Code

Page 7: The Network Management Problem. What Network operators must be able to do

Managed objects reside on many Managed objects reside on many SNMP agent hosts.SNMP agent hosts.

Copies of managed objects reside on Copies of managed objects reside on SNMP management systems.SNMP management systems.

Changes in agent data may have to Changes in agent data may have to be regularly reconciled with the be regularly reconciled with the management system copy.management system copy.

Page 8: The Network Management Problem. What Network operators must be able to do
Page 9: The Network Management Problem. What Network operators must be able to do

Scalability: Today's Network Is Scalability: Today's Network Is Tomorrow's NE Tomorrow's NE

Page 10: The Network Management Problem. What Network operators must be able to do

Layer 2 VPN ScalabilityLayer 2 VPN Scalability

Page 11: The Network Management Problem. What Network operators must be able to do

Virtual Circuit Status MonitoringVirtual Circuit Status Monitoring

A new type of MIB object.A new type of MIB object. Compression software facilities in the Compression software facilities in the

agents and managers. To a degree, agents and managers. To a degree, this could be considered to run this could be considered to run counter to the philosophy of counter to the philosophy of simplicity associated with SNMP.simplicity associated with SNMP.

Page 12: The Network Management Problem. What Network operators must be able to do

MIB Note: ScalabilityMIB Note: Scalability

Page 13: The Network Management Problem. What Network operators must be able to do

Status (e.g., becoming congested or going Status (e.g., becoming congested or going out of service)out of service)

Faults such as an intermediate node/link Faults such as an intermediate node/link failure or receipt of an invalid MPLS labelfailure or receipt of an invalid MPLS label

Deletion by a user via a CLI (i.e., outside Deletion by a user via a CLI (i.e., outside the management system)the management system)

Modification by a user (changing the Modification by a user (changing the administrative status from up to down)administrative status from up to down)

Page 14: The Network Management Problem. What Network operators must be able to do

Other Enterprise Network Other Enterprise Network Scalability IssuesScalability Issues

Storage solutions, such as adding, Storage solutions, such as adding, deleting, modifying, and monitoring SANsdeleting, modifying, and monitoring SANs

Administration of firewalls, such as rules Administration of firewalls, such as rules for permitting or blocking packet transitfor permitting or blocking packet transit

Routers, such as access control lists and Routers, such as access control lists and static routesstatic routes

Security management, such as encryption Security management, such as encryption keys, biometrics facilities, and password keys, biometrics facilities, and password controlcontrol

Application managementApplication management

Page 15: The Network Management Problem. What Network operators must be able to do

Light Reading TrialsLight Reading Trials

MPLS throughputMPLS throughput LatencyLatency IP throughput at OC-48IP throughput at OC-48 IP throughput at OC-192IP throughput at OC-192

Page 16: The Network Management Problem. What Network operators must be able to do

Large NEs Large NEs

They reduce the number of devices They reduce the number of devices required, saving central office (CO) required, saving central office (CO) space and reducing cooling and space and reducing cooling and power requirements.power requirements.

They may help to reduce cabling by They may help to reduce cabling by aggregating links.aggregating links.

They offer a richer feature set.They offer a richer feature set.

Page 17: The Network Management Problem. What Network operators must be able to do

disadvantages disadvantages

They are harder to manage.They are harder to manage. They potentially generate vast They potentially generate vast

amounts of management data.amounts of management data. They are a possible single point of They are a possible single point of

failure if not backed up.failure if not backed up.

Page 18: The Network Management Problem. What Network operators must be able to do

to control the network is may not be to control the network is may not be possible because of possible because of

Process priority clashesProcess priority clashes SNMP message queue sizes that are SNMP message queue sizes that are

too smalltoo small Excessive I/O interruptsExcessive I/O interrupts

Page 19: The Network Management Problem. What Network operators must be able to do

Expensive (and Scarce) Expensive (and Scarce) Development Skill Sets Development Skill Sets

• Object-oriented development and modeling Object-oriented development and modeling using Unified Modeling Language (UML) for using Unified Modeling Language (UML) for capturing requirements, defining actors capturing requirements, defining actors (system users) and use cases (the principal (system users) and use cases (the principal transactions and features), and mapping them transactions and features), and mapping them into software classesinto software classes

• Java/C++Java/C++• GUI, often packaged as part of a browser and GUI, often packaged as part of a browser and

providing access to network diagrams, providing access to network diagrams, provisioning facilities, faults, accounting, and provisioning facilities, faults, accounting, and so onso on

Page 20: The Network Management Problem. What Network operators must be able to do

Server software for long-running, Server software for long-running, multiclient FCAPS processesmulticlient FCAPS processes

Specific support for mature/developing Specific support for mature/developing features, such as ATM/MPLSfeatures, such as ATM/MPLS

CORBA for multiple programming CORBA for multiple programming languages and remote object support languages and remote object support across heterogeneous environmentsacross heterogeneous environments

Database design/upgrade—matching MIB Database design/upgrade—matching MIB to database schema across numerous to database schema across numerous NMS/NE software releases NMS/NE software releases

Page 21: The Network Management Problem. What Network operators must be able to do

Deployment and installation issues—Deployment and installation issues—performance is always an important performance is always an important deployment issue, as is ease of installationdeployment issue, as is ease of installation

IP routingIP routing MPLSMPLS Layer 2 technologies such as ATM, FR, and Layer 2 technologies such as ATM, FR, and

Gigabit EthernetGigabit Ethernet Legacy technologies such as voice-over-Legacy technologies such as voice-over-

TDM and X.25TDM and X.25

Page 22: The Network Management Problem. What Network operators must be able to do

Ability to develop generic software Ability to develop generic software components and models—the components and models—the management system can hide much of the management system can hide much of the complex underlying detail of running the complex underlying detail of running the networknetwork

Client/server designClient/server design Managed object design, part of the Managed object design, part of the

modeling phase for the management modeling phase for the management systemsystem

MIB design—often there is a need for new MIB design—often there is a need for new objects in the managed devices to support objects in the managed devices to support the management systemthe management system

Page 23: The Network Management Problem. What Network operators must be able to do

A solution mindsetA solution mindset Distributed, creative problem solvingDistributed, creative problem solving Taking ownershipTaking ownership Acquiring domain expertiseAcquiring domain expertise Embracing short development cyclesEmbracing short development cycles Minimizing code changesMinimizing code changes Strong testing capabilityStrong testing capability

Page 24: The Network Management Problem. What Network operators must be able to do

Developer Note: A Solution Developer Note: A Solution MindsetMindset

Clear economic valueClear economic value Fulfillment of important requirementsFulfillment of important requirements Resolution of one or more end-user Resolution of one or more end-user

problemsproblems

Page 25: The Network Management Problem. What Network operators must be able to do

Examples of management systems Examples of management systems solutions include the following solutions include the following

Providing minimal management Providing minimal management support for third-party devices support for third-party devices

Creating generic management Creating generic management system components that can be used system components that can be used across numerous different products across numerous different products and technologies and technologies

Aiming for technology-independent Aiming for technology-independent software infrastructure using software infrastructure using standard middleware standard middleware

Page 26: The Network Management Problem. What Network operators must be able to do
Page 27: The Network Management Problem. What Network operators must be able to do

Developer Note: Distributed, Developer Note: Distributed, Creative Problem Solving Creative Problem Solving

Software bugsSoftware bugs NE bugs (can be very hard to identify)NE bugs (can be very hard to identify) Performance bottlenecks in any of the Performance bottlenecks in any of the

FCAPS applications due to congestion in FCAPS applications due to congestion in the network, DBMS, agent, manager, and the network, DBMS, agent, manager, and so onso on

Database problems such as deadlocks, Database problems such as deadlocks, client disconnections, log files filling up, client disconnections, log files filling up, and so onand so on

Page 28: The Network Management Problem. What Network operators must be able to do

Developer Note: Distributed, Developer Note: Distributed, Creative Problem SolvingCreative Problem Solving

Client applications crashing intermittentlyClient applications crashing intermittently MIB table corruption, such as a number of MIB table corruption, such as a number of

set operations that only partially succeedset operations that only partially succeed—for example, three setRequests (against —for example, three setRequests (against a MIB table) are sent but one message a MIB table) are sent but one message results in an agent timeout and the other results in an agent timeout and the other two are successful, which could leave the two are successful, which could leave the table in an inconsistent statetable in an inconsistent state

SNMP agent exceptions SNMP agent exceptions

Page 29: The Network Management Problem. What Network operators must be able to do

the excellent tools available the excellent tools available

UML support packagesUML support packages Java/C++/SDL productsJava/C++/SDL products Version controlVersion control DebuggersDebuggers

Page 30: The Network Management Problem. What Network operators must be able to do

Developer Note: Taking Developer Note: Taking OwnershipOwnership

Page 31: The Network Management Problem. What Network operators must be able to do

Developer Note: Acquiring Developer Note: Acquiring Domain Expertise and Linked Domain Expertise and Linked

OverviewsOverviews Layer 2 and layer 3 traffic engineeringLayer 2 and layer 3 traffic engineering Layer 2 and layer 3 QoSLayer 2 and layer 3 QoS Network managementNetwork management Convergence of legacy technologies into Convergence of legacy technologies into

IP. Many service providers have built large IP. Many service providers have built large IP networks in anticipation of forecasted IP networks in anticipation of forecasted massive demand. These IP networks are, massive demand. These IP networks are, in many cases, not profitable, so service in many cases, not profitable, so service providers are keen to push existing providers are keen to push existing revenue-generating services (such as layer revenue-generating services (such as layer 2) over them.2) over them.

Page 32: The Network Management Problem. What Network operators must be able to do

Developer Note: Acquiring Developer Note: Acquiring Domain Expertise and Linked Domain Expertise and Linked

OverviewsOverviews Backward and forward compatibility Backward and forward compatibility

of new technologies, such as MPLS. of new technologies, such as MPLS. An example is that of a service An example is that of a service provider with existing, revenue-provider with existing, revenue-generating services such as ATM, FR, generating services such as ATM, FR, TDM, and Ethernet. The service TDM, and Ethernet. The service provider wants to retain customers provider wants to retain customers but migrate the numerous incoming but migrate the numerous incoming services into a common MPLS core.services into a common MPLS core.

Page 33: The Network Management Problem. What Network operators must be able to do

Linked OverviewsLinked Overviews

Page 34: The Network Management Problem. What Network operators must be able to do

Developer Note: An ATM Linked Developer Note: An ATM Linked OverviewOverview

ATM is a layer 2 protocol suitable for deployment ATM is a layer 2 protocol suitable for deployment in a range of operational environments (in VLANs in a range of operational environments (in VLANs and ELANs, in the WAN, and also in SP networks).and ELANs, in the WAN, and also in SP networks).

ATM offers a number of different categories and ATM offers a number of different categories and classes of service. The required service level is classes of service. The required service level is enforced by switches using policing (traffic cop enforced by switches using policing (traffic cop function), shaping (modifying the traffic function), shaping (modifying the traffic interarrival time), marking (for subsequent interarrival time), marking (for subsequent processing), and dropping.processing), and dropping.

Traffic is presented to an ATM switch and Traffic is presented to an ATM switch and converted into a stream of 53-byte ATM cells.converted into a stream of 53-byte ATM cells.

The stream of cells is transmitted through an ATM The stream of cells is transmitted through an ATM cloud.cloud.

Page 35: The Network Management Problem. What Network operators must be able to do

Developer Note: An ATM Linked Developer Note: An ATM Linked OverviewOverview

A preconfigured virtual circuit dictates the route A preconfigured virtual circuit dictates the route taken by the cell stream. Virtual circuits can be taken by the cell stream. Virtual circuits can be created either manually or using a signaling created either manually or using a signaling protocol. If no virtual circuit is present then PNNI protocol. If no virtual circuit is present then PNNI can signal switched virtual circuits (SVCs).can signal switched virtual circuits (SVCs).

The virtual circuit route passes through The virtual circuit route passes through intermediate node interfaces and uses a label-intermediate node interfaces and uses a label-based addressing scheme.based addressing scheme.

Bandwidth can be reserved along the path of this Bandwidth can be reserved along the path of this virtual circuit in what is called a contract.virtual circuit in what is called a contract.

Various traffic engineering capabilities are Various traffic engineering capabilities are available, such as dictating the route for a virtual available, such as dictating the route for a virtual circuit.circuit.

Page 36: The Network Management Problem. What Network operators must be able to do

the essential ATM managed the essential ATM managed objects can be derived objects can be derived

ATM nodesATM nodes A virtual circuit (switched, permanent, or A virtual circuit (switched, permanent, or

soft) spanning one or more nodessoft) spanning one or more nodes A set of interfaces and linksA set of interfaces and links A set of locally significant labels used for A set of locally significant labels used for

addressingaddressing An optional route or designated transit listAn optional route or designated transit list A bandwidth contractA bandwidth contract Traffic engineering settingsTraffic engineering settings QoS detailsQoS details

Page 37: The Network Management Problem. What Network operators must be able to do

Developer Note: An IP Linked Developer Note: An IP Linked OverviewOverview

IP is packet-based—IP nodes make forwarding IP is packet-based—IP nodes make forwarding decisions with decisions with everyevery packet. packet.

IP is IP is notnot connection-oriented. connection-oriented. IP provides a single class of service: best effort.IP provides a single class of service: best effort. IP does not provide traffic engineering IP does not provide traffic engineering

capabilities.capabilities. IP packets have two main sections: header and IP packets have two main sections: header and

data.data. IP header lookups are required at each hop (with IP header lookups are required at each hop (with

the present line-rate technology, lookups are no the present line-rate technology, lookups are no longer such a big issue. Routing protocol longer such a big issue. Routing protocol convergence may cause more problems).convergence may cause more problems).

Page 38: The Network Management Problem. What Network operators must be able to do

Developer Note: An IP Linked Developer Note: An IP Linked OverviewOverview

IP devices are either hosts or routers IP devices are either hosts or routers (often called gateways).(often called gateways).

Hosts do not forward IP packets—routers Hosts do not forward IP packets—routers do.do.

IP devices have routing tables.IP devices have routing tables. IP operates in conjunction with other IP operates in conjunction with other

protocols, such as OSPF, IS-IS, Border protocols, such as OSPF, IS-IS, Border Gateway Protocol 4 (BGP4), and Internet Gateway Protocol 4 (BGP4), and Internet Control Message Protocol (ICMP).Control Message Protocol (ICMP).

Large IP networks can be structured as Large IP networks can be structured as autonomous systems made up of smaller autonomous systems made up of smaller interior areas or levels.interior areas or levels.

Page 39: The Network Management Problem. What Network operators must be able to do

the essential managed objects of IP the essential managed objects of IP are are

IP nodes (routers, hosts, clients, servers)IP nodes (routers, hosts, clients, servers) IP interfacesIP interfaces IP subnetsIP subnets IP protocols (routed protocols such as IP protocols (routed protocols such as

TCP/IP and routing protocols such as OSPF TCP/IP and routing protocols such as OSPF and IS-IS)and IS-IS)

Interior Gateway Protocol (IGP) areas Interior Gateway Protocol (IGP) areas (OSPF) or levels (IS-IS)(OSPF) or levels (IS-IS)

Exterior Gateway Protocol (EGP) Exterior Gateway Protocol (EGP) autonomous systemsautonomous systems

Page 40: The Network Management Problem. What Network operators must be able to do

Embracing Short Development Embracing Short Development Cycles Cycles

Reduced feature sets in more Reduced feature sets in more frequent releasesfrequent releases

Foundation releasesFoundation releases Good upgrade pathsGood upgrade paths Getting good operational feedback Getting good operational feedback

from end usersfrom end users

Page 41: The Network Management Problem. What Network operators must be able to do

Minimizing Code Changes Minimizing Code Changes

Elements of NMS Development Elements of NMS Development

Page 42: The Network Management Problem. What Network operators must be able to do

NMS DevelopmentNMS Development Using a browser-based GUI, the developer has provisioned Using a browser-based GUI, the developer has provisioned

onto the network a managed object such as an ATM virtual onto the network a managed object such as an ATM virtual circuit or an MPLS LSP.circuit or an MPLS LSP.

The developer wants to check that the software executed The developer wants to check that the software executed the correct actions.the correct actions.

During provisioning, the developer verifies that the correct During provisioning, the developer verifies that the correct Java code executed using a Java console and trace files Java code executed using a Java console and trace files (similar actions can be done for C/C++ systems).(similar actions can be done for C/C++ systems).

The database is updated by the management system code, The database is updated by the management system code, and this can be checked by running an appropriate SQL and this can be checked by running an appropriate SQL script.script.

The next step is verifying that the correct set of managed The next step is verifying that the correct set of managed objects was written to the NE. To do this, the developer objects was written to the NE. To do this, the developer uses a MIB browser to check that the row object has been uses a MIB browser to check that the row object has been written to the associated agent MIB.written to the associated agent MIB.

Page 43: The Network Management Problem. What Network operators must be able to do

Other skills are :Other skills are : Data analysis—matching NE data to the NMS database Data analysis—matching NE data to the NMS database

schemaschema Data analysis—defining NMS-resident objects that exist in Data analysis—defining NMS-resident objects that exist in

complex component form in the network (an example is a complex component form in the network (an example is a VPN, as discussed earlier in this chapter)VPN, as discussed earlier in this chapter)

Upgrade considerations for when MIBs change (as they Upgrade considerations for when MIBs change (as they regularly do)regularly do)

UML, Java, and object-oriented developmentUML, Java, and object-oriented development Class design for major NMS features, like MPLS provisioningClass design for major NMS features, like MPLS provisioning GUI developmentGUI development Middleware using CORBA-based productsMiddleware using CORBA-based products Insulating applications from low-level codeInsulating applications from low-level code

Page 44: The Network Management Problem. What Network operators must be able to do

When MIBs Change: Upgrade When MIBs Change: Upgrade ConsiderationsConsiderations

Deprecate old objects no longer in use—Deprecate old objects no longer in use—don't delete them from the MIB if at all don't delete them from the MIB if at all possible.possible.

Keep the MIB object identifiers sequential; Keep the MIB object identifiers sequential; add new OIDs as necessary. add new OIDs as necessary.

Don't change any existing OIDs in MIBs Don't change any existing OIDs in MIBs that are currently in use by the NMS. RFC that are currently in use by the NMS. RFC 2578 provides guidelines for this.2578 provides guidelines for this.

Ensure that MIB files do not have to be Ensure that MIB files do not have to be changed in order to work with changed in order to work with management systems. management systems.

Page 45: The Network Management Problem. What Network operators must be able to do

UML, Java, and Object-Oriented UML, Java, and Object-Oriented DevelopmentDevelopment

Structured classification (use cases, Structured classification (use cases, classes, components, and nodes)classes, components, and nodes)

Dynamic behavior (describes system Dynamic behavior (describes system changes over time)changes over time)

Model management (organization of Model management (organization of the model itself) the model itself)

Page 46: The Network Management Problem. What Network operators must be able to do

Class Design for Major NMS Class Design for Major NMS FeaturesFeatures

GUI DevelopmentGUI Development Middleware Using CORBA-Based Middleware Using CORBA-Based

ProductsProducts Insulating Applications from Insulating Applications from

Low-Level CodeLow-Level Code

Page 47: The Network Management Problem. What Network operators must be able to do

MPLS: Second ChunkMPLS: Second Chunk Explicit Route Objects (ERO), strict and looseExplicit Route Objects (ERO), strict and loose Resource blocksResource blocks Tunnels and LSPsTunnels and LSPs In-segmentsIn-segments Out-segmentsOut-segments Cross-connectsCross-connects Routing protocolsRouting protocols Signaling protocolsSignaling protocols Label operations: lookup, push, swap, and popLabel operations: lookup, push, swap, and pop Traffic engineeringTraffic engineering QoSQoS

Page 48: The Network Management Problem. What Network operators must be able to do

Label OperationsLabel Operations Lookup: The node examines the value of the topmost label. Lookup: The node examines the value of the topmost label.

This operation occurs at every node in an MPLS cloud. In This operation occurs at every node in an MPLS cloud. In our example, lookup would occur using Label2. Typically, a our example, lookup would occur using Label2. Typically, a label lookup results in the packet being relabeled and label lookup results in the packet being relabeled and forwarded through a node interface indicated by the forwarded through a node interface indicated by the incoming label.incoming label.

Swap: This occurs when an MPLS node replaces the label Swap: This occurs when an MPLS node replaces the label with a new one.with a new one.

Pop: This occurs when the topmost label is removed from Pop: This occurs when the topmost label is removed from the stack. If the label stack has a depth of one, then the the stack. If the label stack has a depth of one, then the packet is no longer MPLS-encapsulated. In this case, an IP packet is no longer MPLS-encapsulated. In this case, an IP lookup can be performed using the IP header.lookup can be performed using the IP header.

Push: This occurs when a label is either pushed onto the Push: This occurs when a label is either pushed onto the label stack or attached to an unlabeled packet. label stack or attached to an unlabeled packet.

Page 49: The Network Management Problem. What Network operators must be able to do

MPLS EncapsulationMPLS Encapsulation 0 – IPv4 explicit null that signals the 0 – IPv4 explicit null that signals the

receiving node to pop the label and receiving node to pop the label and execute an IP lookupexecute an IP lookup

1 – Router alert that indicates to the 1 – Router alert that indicates to the receiving node to examine the packet receiving node to examine the packet more closely rather than simply forwarding more closely rather than simply forwarding itit

2 – IPv6 explicit null2 – IPv6 explicit null 3 – Implicit null that signals the receiving 3 – Implicit null that signals the receiving

node to pop the label and execute an IP node to pop the label and execute an IP lookuplookup

Page 50: The Network Management Problem. What Network operators must be able to do

SummarySummary

There are some serious problems There are some serious problems affecting network management. affecting network management. Bringing managed data and code Bringing managed data and code together is one of the central together is one of the central foundations of computing and foundations of computing and network management. Achieving this network management. Achieving this union of data and code in a scalable union of data and code in a scalable fashion is a problem that gets more fashion is a problem that gets more difficult as networks grow difficult as networks grow