Best practices for building network operations center

Best Practices for Building Network Operations Center

Satish Chavan

Network Operations Center A network operations center (NOC, pronounced like the word knock), also known as a "network management center", is one or more locations from which network monitoring and control, or network management, is exercised over a computer, telecommunication or satellite network.

History Early versions of NOCs have been around since the 1960s. A Network Control Center was opened in New York by AT&T in 1962 that used status boards to display switch and route information, in real-time, from AT&T's most important toll switches. AT&T later replaced their Network Control Center with a NOC in 1977 in Bedminster, New Jersey AT&T revamped and modernized the NOC in 1987, adding a 75-screen video wall where computer-driven support systems provided information on multiple layers and categories of network activity. Managers used computer systems and terminals to find detailed information on any switch or route in the network. They then used those same systems to issue instructions to any place in the network. Global Network Operations Center AT&T’s system had become a Worldwide Intelligent Network. Two regional control centers, in Denver and Conyers, Ga., opened in 1991, and assumed the task of monitoring and managing the flow of traffic onto and off of the network. In 1999, AT&T replaced the NOC with a new Global Network Operations Center, to better to meet the needs of the 21st century.

Satish Chavan

Network Operations Center -Purpose

In telecommunication environments, NOCs are responsible for monitoring power

failures, access network, connectivity, communication equipment alarms and other

performance issues that may affect the telecom network and services.

A NOC is usually staffed 24×7 with personnel who continuously monitor for outages,

faults, critical events, and abnormalities with the network. These events are reported by

sophisticated network monitoring software installed on the network or on the individual

devices being monitored. At fixed time intervals, each device on the network checks in

with a central manager to provide vital statistics on its health. Requires a high level of

expertise and understanding of various technology platforms. This proactively ensures

that problems with the network are detected and fixed before they can cause significant

impact on the business.

Satish Chavan

Network Operations Center -CSP Network

Satish Chavan

Network Operations Center - Operations 1

NOC Operate – Level 1 support Proactive alarm monitoring 24x7 Issue ticket management per service level agreements (SLA) Fault management

NOC Operate – Level 2 support

Higher level support for fault management Change execution Root cause analysis Co-ordination with TAC

NOC Operate – Level 3 support

Change validation Problem management Co-ordination with TAC

NOC Operate – Performance Management

Performance monitoring and reporting Analysis and improvement suggestions

Satish Chavan

Network Operations Center - Operations 2

NOC Operate – Configuration

Configuration activities of new network elements Integration of new NEs with the NOC Addition of new route or patch, area into the network

Category based of time

full-time surveillance. only after-hours backup/disaster recovery service

NOC Consulting

build, operate, transfer service

Satish Chavan

NOC- Key characteristics & Business benefits

Key characteristics

1. Skilled Staff

2. Focus on Performance

3. Efficient Processes

4. Integrated Set of Tools

5. Automation and Intelligent Tools

6. Managing service performance

7. Focus on Security

8. Being proactive

9. Quality Consistency

Business benefits

1. Quality Consistency:

2. Better Traffic /Resource Management

3. Lower Cost

4. Higher Security

5. Reduce business impact through

proactive approach.

6. Customer satisfaction index

Satish Chavan

N O C - Standards

FCAPS is the ISO Telecommunications Management Network model and framework for network management. Is defined five areas, using the acronym FCAPS: •Fault Management •Configuration Management •Accounting (Administration) •Performance Management •Security Management.

The FCAPS model can be seen as bottom-up or network-centric. The FAB model looks at the processes more from top-down is customer/business-centric. The two standards that have emerged are Simple Network Management Protocol (SNMP) by IETF and Common Management Information Protocol (CMIP) by ITU-T. FAB model defined in the Business Process Framework (eTOM). FAB is short for fulfillment, assurance, billing.

Satish Chavan

N O C - FCAPS

1. Fault management deals with the process of recognizing, isolating, and resolving a fault that occurs in the network. Identification of potential network issues also fall under Fault management.

2. Configuration management involves collection and storage of configuration from various network devices, and includes tracking changes to a device configuration. Because many network issues are due to configuration changes gone wrong, this can be considered an important contribution to proactive network management and monitoring.

3. Accounting applies to service-provider networks where network resource utilization is tracked and then the information is used for billing or charge-back. In networks where billing does not apply, accounting is replaced with administration, which refers to administering end-users in the network with passwords, permissions, etc.

4. Performance management involves managing overall network performance. Data for parameters associated with performance, such as throughput, packet loss, response times, utilization, etc., are collected mostly using SNMP.

5. Security is another important area of network management. Security management in FCAPS covers the process of controlling access to resources in the network which includes data as well as configurations and protecting user information from unauthorized users.

Satish Chavan

N O C - ITIL

Satish Chavan

FCAPS from an ITIL Perspective

Satish Chavan

FCAPS ITIL

Fault Management Includes Detecting, Isolating and Resolving network problems

Service Operations

Event Management

Incident Management

Configuration Management Gathering and storing the network and system configuration information Tracks change Simplifies the change process

Service Transition Change and Configuration Management

Accounting Management Facilitates better distribution of resources Measures the resource usage Helps reducing operational cost and Establishes better control

Service Strategy Financial Management

Service Design Service Level Management

Service Operation Technical and Application Management

FCAPS from an ITIL Perspective

Satish Chavan

FCAPS ITIL

Performance Management To understand the current network health and efficiency Includes measuring various performance metrics Ensures service availability and performance at an optimal level Unnoticed problems might lead to Event Management and Incident Management

Service Design Capacity & Availability Management

Service Operation Technical and Application Management

Continual Service Improvement improve quality of service Includes standardizing and base-lining of quality achieved.

Security Management Maintains the user and business information confidentiality Includes protecting the network from unauthorized users Controls overall activities and Ensures data security through authentication and encryption

Service Design Information Security Management

Service Operation Access Management (Process) Technical and Application Management (Function)

N O C -Network Monitoring

Common practices define the basic components that are essential for network monitoring and are applicable to every network. Best practices for monitoring is a guideline to implement a good network monitoring strategy. Adopting the best practices can help the network admin streamline their network monitoring to identify and resolve issues much faster with very less MTTR (Mean Time To Resolve).

Best Practices • Baseline network behavior: Base lining network behavior over a couple of weeks or even months will help the network admin

understand what normal behavior in the network is. Knowledge of baseline behavior aids proactive troubleshooting and even prevents network downtime.

• Escalation matrix Network issues become a problem is because the alerts triggered based on a threshold are

ignored or the right person is not alerted. In a large network, there are can be multiple administrators or people who take care of different aspects of the network. Escalation Policy when a malfunction occurs, or a potential problem is detected.

An escalation matrix and plan ensures that issues are looked at and resolved on time.

Satish Chavan

N O C -Network Monitoring

• Reports at every layer: Networks function based on the OSI Using a monitoring system that supports multiple technologies to monitor at all layers, as well as different types of devices in the network would make problem detection and troubleshooting easier. Thus, when an application delivery fails, the monitoring system can alert whether it is a server issue, a routing problem, a bandwidth problem, or a hardware malfunction.

• Implement High Availability with failover options: Most monitoring systems are set up in the

network they monitor. But if a problem occurs and the network goes down, the monitoring system can go down too.

It is recommended to implement a monitoring strategy with High-Availability through failover. High Availability (HA) ensures that the monitoring system does not have a single point of failure and provide data needed for troubleshooting. And to avoid a single point of failure, it is recommended to set up the failover system at a remote DR site.

• Configuration management: Most network issues originate from incorrect configurations. There

are several instances where even minor configuration mistakes have led to network downtime or loss of data. Unauthorized configuration changes to devices can lead to serious security lapses that include hacking and data theft.

• Capacity planning and Growth: An organization grows, infrastructure associated with the

organization also should grow. When setting up a monitoring system account for future growth.

Satish Chavan

Essential element in NOC management

Satish Chavan

Network Operation Center Best Practices in terms of process and tools . 1. Ticketing system

A ticketing system will enable you to keep track of all open issues, according to severity, urgency and the person assigned to handle.

2. Knowledge base Centralized source for all knowledge and documentation that is accessible to your entire team. This knowledge base should be a fluid information source to be continuously updated with experiences and lessons learned for future reference and improvements.

3 . Reporting Reports on a daily, weekly and monthly basis, include all major incidents and a root cause for

every resolved incident. 4. Monitoring

There are two major types of monitoring processes relevant to NOC •Monitoring infrastructure . •Customer help desk/experience.

5. Process Automation Implementing Process Automation significantly reduces mean time to recovery (MTTR) and helps NOCs meet SLA’s by having a procedure in place to handle incident resolution and to consistently provide high quality response regardless of complexity of the process. examples - disk space clean-up, reset process help reducing the manual, routine tasks.

Key Factors NOC Performance Management Solution

Satish Chavan

•Real time complete system-wide visibility.

•Alerting and Reporting

•Monitoring Abilities

•Multi-vendor Support

•Scalability

•Simple Interface

•Easy to Deploy

•Notifications

NOC Service Assurance and Service Management Activities

Satish Chavan

KPIs & SLAs

1. Number of tickets received and resolved .

2. Number of tickets proactively raised and resolved based on severity.

3. Number of tickets escalated to technical operations .

4. Number of tickets solved in SLA without escalation to technical operations.

5. Tickets raised <15mins of the occurrence of alarm .

6. 3rd party escalation and follow-ups as per SLA.

Surveillance /Fault

Incident Management

Problem Management

SLA Management

Service Management Activities

NOC- Service Priority Matrix

Satish Chavan

NOC- Service Level Agreement

Satish Chavan

Technology

Best practices for building network operations center