Upload
satish-chavan
View
223
Download
0
Embed Size (px)
Citation preview
Best Practices for Building Network Operations Center
Satish Chavan
Network Operations Center A network operations center (NOC, pronounced like the word knock), also known as a "network management center", is one or more locations from which network monitoring and control, or network management, is exercised over a computer, telecommunication or satellite network.
History Early versions of NOCs have been around since the 1960s. A Network Control Center was opened in New York by AT&T in 1962 that used status boards to display switch and route information, in real-time, from AT&T's most important toll switches. AT&T later replaced their Network Control Center with a NOC in 1977 in Bedminster, New Jersey AT&T revamped and modernized the NOC in 1987, adding a 75-screen video wall where computer-driven support systems provided information on multiple layers and categories of network activity. Managers used computer systems and terminals to find detailed information on any switch or route in the network. They then used those same systems to issue instructions to any place in the network. Global Network Operations Center AT&T’s system had become a Worldwide Intelligent Network. Two regional control centers, in Denver and Conyers, Ga., opened in 1991, and assumed the task of monitoring and managing the flow of traffic onto and off of the network. In 1999, AT&T replaced the NOC with a new Global Network Operations Center, to better to meet the needs of the 21st century.
Satish Chavan
Network Operations Center -Purpose
In telecommunication environments, NOCs are responsible for monitoring power
failures, access network, connectivity, communication equipment alarms and other
performance issues that may affect the telecom network and services.
A NOC is usually staffed 24×7 with personnel who continuously monitor for outages,
faults, critical events, and abnormalities with the network. These events are reported by
sophisticated network monitoring software installed on the network or on the individual
devices being monitored. At fixed time intervals, each device on the network checks in
with a central manager to provide vital statistics on its health. Requires a high level of
expertise and understanding of various technology platforms. This proactively ensures
that problems with the network are detected and fixed before they can cause significant
impact on the business.
Satish Chavan
Network Operations Center -CSP Network
Satish Chavan
Network Operations Center - Operations 1
NOC Operate – Level 1 support Proactive alarm monitoring 24x7 Issue ticket management per service level agreements (SLA) Fault management
NOC Operate – Level 2 support
Higher level support for fault management Change execution Root cause analysis Co-ordination with TAC
NOC Operate – Level 3 support
Change validation Problem management Co-ordination with TAC
NOC Operate – Performance Management
Performance monitoring and reporting Analysis and improvement suggestions
Satish Chavan
Network Operations Center - Operations 2
NOC Operate – Configuration
Configuration activities of new network elements Integration of new NEs with the NOC Addition of new route or patch, area into the network
Category based of time
full-time surveillance. only after-hours backup/disaster recovery service
NOC Consulting
build, operate, transfer service
Satish Chavan
NOC- Key characteristics & Business benefits
Key characteristics
1. Skilled Staff
2. Focus on Performance
3. Efficient Processes
4. Integrated Set of Tools
5. Automation and Intelligent Tools
6. Managing service performance
7. Focus on Security
8. Being proactive
9. Quality Consistency
Business benefits
1. Quality Consistency:
2. Better Traffic /Resource Management
3. Lower Cost
4. Higher Security
5. Reduce business impact through
proactive approach.
6. Customer satisfaction index
Satish Chavan
N O C - Standards
FCAPS is the ISO Telecommunications Management Network model and framework for network management. Is defined five areas, using the acronym FCAPS: •Fault Management •Configuration Management •Accounting (Administration) •Performance Management •Security Management.
The FCAPS model can be seen as bottom-up or network-centric. The FAB model looks at the processes more from top-down is customer/business-centric. The two standards that have emerged are Simple Network Management Protocol (SNMP) by IETF and Common Management Information Protocol (CMIP) by ITU-T. FAB model defined in the Business Process Framework (eTOM). FAB is short for fulfillment, assurance, billing.
Satish Chavan
N O C - FCAPS
1. Fault management deals with the process of recognizing, isolating, and resolving a fault that occurs in the network. Identification of potential network issues also fall under Fault management.
2. Configuration management involves collection and storage of configuration from various network devices, and includes tracking changes to a device configuration. Because many network issues are due to configuration changes gone wrong, this can be considered an important contribution to proactive network management and monitoring.
3. Accounting applies to service-provider networks where network resource utilization is tracked and then the information is used for billing or charge-back. In networks where billing does not apply, accounting is replaced with administration, which refers to administering end-users in the network with passwords, permissions, etc.
4. Performance management involves managing overall network performance. Data for parameters associated with performance, such as throughput, packet loss, response times, utilization, etc., are collected mostly using SNMP.
5. Security is another important area of network management. Security management in FCAPS covers the process of controlling access to resources in the network which includes data as well as configurations and protecting user information from unauthorized users.
Satish Chavan
N O C - ITIL
Satish Chavan
FCAPS from an ITIL Perspective
Satish Chavan
FCAPS ITIL
Fault Management Includes Detecting, Isolating and Resolving network problems
Service Operations
Event Management
Incident Management
Configuration Management Gathering and storing the network and system configuration information Tracks change Simplifies the change process
Service Transition Change and Configuration Management
Accounting Management Facilitates better distribution of resources Measures the resource usage Helps reducing operational cost and Establishes better control
Service Strategy Financial Management
Service Design Service Level Management
Service Operation Technical and Application Management
FCAPS from an ITIL Perspective
Satish Chavan
FCAPS ITIL
Performance Management To understand the current network health and efficiency Includes measuring various performance metrics Ensures service availability and performance at an optimal level Unnoticed problems might lead to Event Management and Incident Management
Service Design Capacity & Availability Management
Service Operation Technical and Application Management
Continual Service Improvement improve quality of service Includes standardizing and base-lining of quality achieved.
Security Management Maintains the user and business information confidentiality Includes protecting the network from unauthorized users Controls overall activities and Ensures data security through authentication and encryption
Service Design Information Security Management
Service Operation Access Management (Process) Technical and Application Management (Function)
N O C -Network Monitoring
Common practices define the basic components that are essential for network monitoring and are applicable to every network. Best practices for monitoring is a guideline to implement a good network monitoring strategy. Adopting the best practices can help the network admin streamline their network monitoring to identify and resolve issues much faster with very less MTTR (Mean Time To Resolve).
Best Practices • Baseline network behavior: Base lining network behavior over a couple of weeks or even months will help the network admin
understand what normal behavior in the network is. Knowledge of baseline behavior aids proactive troubleshooting and even prevents network downtime.
• Escalation matrix Network issues become a problem is because the alerts triggered based on a threshold are
ignored or the right person is not alerted. In a large network, there are can be multiple administrators or people who take care of different aspects of the network. Escalation Policy when a malfunction occurs, or a potential problem is detected.
An escalation matrix and plan ensures that issues are looked at and resolved on time.
Satish Chavan
N O C -Network Monitoring
• Reports at every layer: Networks function based on the OSI Using a monitoring system that supports multiple technologies to monitor at all layers, as well as different types of devices in the network would make problem detection and troubleshooting easier. Thus, when an application delivery fails, the monitoring system can alert whether it is a server issue, a routing problem, a bandwidth problem, or a hardware malfunction.
• Implement High Availability with failover options: Most monitoring systems are set up in the
network they monitor. But if a problem occurs and the network goes down, the monitoring system can go down too.
It is recommended to implement a monitoring strategy with High-Availability through failover. High Availability (HA) ensures that the monitoring system does not have a single point of failure and provide data needed for troubleshooting. And to avoid a single point of failure, it is recommended to set up the failover system at a remote DR site.
• Configuration management: Most network issues originate from incorrect configurations. There
are several instances where even minor configuration mistakes have led to network downtime or loss of data. Unauthorized configuration changes to devices can lead to serious security lapses that include hacking and data theft.
• Capacity planning and Growth: An organization grows, infrastructure associated with the
organization also should grow. When setting up a monitoring system account for future growth.
Satish Chavan
Essential element in NOC management
Satish Chavan
Network Operation Center Best Practices in terms of process and tools . 1. Ticketing system
A ticketing system will enable you to keep track of all open issues, according to severity, urgency and the person assigned to handle.
2. Knowledge base Centralized source for all knowledge and documentation that is accessible to your entire team. This knowledge base should be a fluid information source to be continuously updated with experiences and lessons learned for future reference and improvements.
3 . Reporting Reports on a daily, weekly and monthly basis, include all major incidents and a root cause for
every resolved incident. 4. Monitoring
There are two major types of monitoring processes relevant to NOC •Monitoring infrastructure . •Customer help desk/experience.
5. Process Automation Implementing Process Automation significantly reduces mean time to recovery (MTTR) and helps NOCs meet SLA’s by having a procedure in place to handle incident resolution and to consistently provide high quality response regardless of complexity of the process. examples - disk space clean-up, reset process help reducing the manual, routine tasks.
Key Factors NOC Performance Management Solution
Satish Chavan
•Real time complete system-wide visibility.
•Alerting and Reporting
•Monitoring Abilities
•Multi-vendor Support
•Scalability
•Simple Interface
•Easy to Deploy
•Notifications
NOC Service Assurance and Service Management Activities
Satish Chavan
KPIs & SLAs
1. Number of tickets received and resolved .
2. Number of tickets proactively raised and resolved based on severity.
3. Number of tickets escalated to technical operations .
4. Number of tickets solved in SLA without escalation to technical operations.
5. Tickets raised <15mins of the occurrence of alarm .
6. 3rd party escalation and follow-ups as per SLA.
Surveillance /Fault
Incident Management
Problem Management
SLA Management
Service Management Activities
NOC- Service Priority Matrix
Satish Chavan
NOC- Service Level Agreement
Satish Chavan