Server Monitoring Document

Embed Size (px)

Citation preview

  • 7/29/2019 Server Monitoring Document

    1/47

    Server Monitoring

    1

  • 7/29/2019 Server Monitoring Document

    2/47

    Bulletin What is server? What will do ?

    What is Server Monitoring?

    Goals of Monitoring

    Benefits of Monitoring

    Components of Monitoring

    Monitoring Parameters or Counters

    Monitoring Tools

    Choosing a Tool

    How tool works?

    Conclusion

    2

  • 7/29/2019 Server Monitoring Document

    3/47

    What is server? What will do?

    A server is a physical computer, dedicated to run one or more

    services, to serve the needs of the users on a network or on same

    computer. There are different type of servers are available, depending

    on the service, server can be selected. Some of the servers are listed

    below.

    Application server

    Web server

    Database server

    Proxy server

    File Server

    Mail Server, etc

    3

  • 7/29/2019 Server Monitoring Document

    4/47

    Web Server Serves web pages to computers (Clients) that connect toit.

    ApplicationServer

    That handles all application operations between usersand an organizations back-end applications or database.

    Mail Server Stores users' email accounts, send and receive emails.

    File Server Stores file that can be accessed by other computers(clients).

    Proxy Server Proxy server lie between a client program and server. Itprovides filtering, translation, sharing connections etc

    Database Server

    Database server is the term used to refer to the back-endsystem of a database application using client/server

    architecture. The back-end, sometimes called a databaseserver, performs tasks such as data analysis, storage,data manipulation, archiving, and other non-user specifictasks.

  • 7/29/2019 Server Monitoring Document

    5/47

    In today's internet economy, a well designed and smoothly

    operating applications provides a distinct competitive advantages: The

    ability to reach customers around the world 24x7. Ensuring that all of the

    elements of the application resided in server are functioning properly is

    critical to maximizing the companies investment, therefore monitoring

    the server is an important and critical element of any web presence.

    5

  • 7/29/2019 Server Monitoring Document

    6/47

    Server Monitoring is ..

    The process of automatically scanning the server on networks forirregularities or failures.

    To monitor server's system resources like CPU Usage, MemoryConsumption, I/O, Network, Disk Usage, Process etc.

    To understand server resource usage which can help for thecapacity planning and provide a better end-user experience.

    Ensures that, Server is capable of hosting the application.

    To make sure that a server is active, healthy and responding torequests appropriately.

    Allows to identify issues and fix unexpected problems before theyimpact end-users productivity.

    Lets to get real-time internal statistics from the server. By internalstatistics, we mean things like CPU Usage, number of openconnections, amount of free memory, number of cache hits from

    server, etc.6

  • 7/29/2019 Server Monitoring Document

    7/47

    Goals of Monitoring:

    Determining whether it is possible to improve

    server performance. For example, by

    monitoring the server response time for

    frequently used requests, you can determine

    whether changes need or not.

    Troubleshoot any problem. For example,

    downloading a page in application not

    working, to troubleshoot these problems, goal

    would be to track down the problem using theavailable resources.

    Continue..7

  • 7/29/2019 Server Monitoring Document

    8/47

    Goals of Monitoring:.

    Optimizing the performance, is minimal response time and maximum

    through put as a result of minimizing network traffic, disk I/O and CPU

    time.

    Establishing a server performance baseline: is done by taking

    performance measurements over time. Each measurement should be

    compared against the same measurement taken earlier.

    For example, if the amount of time to perform set of actions

    increases, want to examine the actions and take server performance

    improvement actions.

    Continue..

  • 7/29/2019 Server Monitoring Document

    9/47

    After server performance baseline, compare the base line withthe current server performance. This may indicate areas where the

    server need to be reconfigured for better performance.

    At minimum, measurements should be taken to determine baseline:

    Peak and off-peak hours of operation

    Response query or response times

    Server backup and restore completion times

    Server reaction in down-times

    Server response at overload

    9

  • 7/29/2019 Server Monitoring Document

    10/47

    Benefits of Monitoring:

    Intrusion detection

    Ensuring continuity and performance

    Security considerations

    Automatic overload prevention

    Server downtime reaction

    Scalability

    To detect internal link errors, etc

    10

  • 7/29/2019 Server Monitoring Document

    11/47

    Intrusion Detection :

    It is a big reason for monitoring the

    server. More often than not, servers are

    compromised without anyone knowing.

    By monitoring the server for intrusion, an

    administrator is aware that some one is

    trying to compromise server security, and

    he/she may then take steps to avoid in

    the future and even may find out who is

    intruding.

    For example, If an intruder can logged

    into application and post huge number of

    queries in short span of time, this will

    cause the server to go in the denial of

    service.11

  • 7/29/2019 Server Monitoring Document

    12/47

    APSRTC website was hacked on Jan 13th,2013:

    Aryan Hackers, Bangladesh hackers group was entered into the

    APSRTC server and control about 1 hour on Jan 13th, 2013.

    The RTC's IT personnel got into act and restored the server.

  • 7/29/2019 Server Monitoring Document

    13/47

    Ensuring Continuity and Performance :

    It is important that an application is available for the customer

    24x7, that is frequently inaccessible is likely to loss business and

    destroy customer loyalty. The server might go-down for reasons like

    hardware failure, application failure, network trafficetc.

    By monitoring the server we can find out the problems before they

    impact the business and we can provide continuity service.

    Applications unavailability implies that closing the business for that

    much time. It is not only leads to business losses but also in terms of

    reputation of the company.

    13

    Continue..

  • 7/29/2019 Server Monitoring Document

    14/47

    Ensuring Continuity and Performance :

    Troubleshooting the server performance is also reason,

    For example, if the user is not able to connect to server, you may

    want to monitor the server to troubleshoot these problems.

    If any component like driver, motherboard, controller failed, the

    server stay down. Monitoring provides, an administrator needs to

    be know the as soon as possible that hardware is failed, so that

    component can replaced.

  • 7/29/2019 Server Monitoring Document

    15/47

    Security Considerations :

    There are many other security features that an

    administrator may need to monitor. Some possible

    examples are :

    1. Denial of service filtering A DoS filter rejects the

    connections, if the request is unauthenticated with in

    the monitor time.

    2. Unused services monitoring decides which

    services want use and disable the rest.

    3. Carefully manage clients by removing users who no

    longer need access servers.

    15

  • 7/29/2019 Server Monitoring Document

    16/47

    Overload prevention:

    Every server has defined load limits , because it can handle

    only a limited number of concurrent client connections (Users). When

    server is near to or over its limits, then we called that situation as

    overload.

    Causes of overload:

    1. Too much legitimate web traffic huge number of clients

    connecting to the server with in a short interval.

    2. Computer warms that sometime cause abnormal traffic.

    3. XSS Viruses (Cross scripting viruses)

    4. Slow internet connection Requests are served very slowly and

    the number of opened connections increases, so that the server limits

    are reached.

    5. Servers partially unavailable16

    Continue..

  • 7/29/2019 Server Monitoring Document

    17/47

    Due to above reasons the server might be overloaded, it

    causes to business loss. To prevent these problems, lets know the

    administrator about the overload and let him take right actions at

    right time.

    For instance, If the admin identified that server is

    overloaded, by moving key factors to another server, he is able to

    prevent the problem.

  • 7/29/2019 Server Monitoring Document

    18/47

  • 7/29/2019 Server Monitoring Document

    19/47

    Avoid server downtime:

    Server downtime refers that, server is unavailable to provide service.

    Downtime can results from overloaded processors, rapidly expanding

    memory usage, disk errors and other problems.

    For Example, if you are in a video conference with an important client,

    suddenly the server became busy and slow down, it either creates a gap

    between the communication or can leads to losing the valuable client. More

    over it effects to the reputation of the company.

    19

  • 7/29/2019 Server Monitoring Document

    20/47

    Close monitoring and management of key server metrics prevents

    the downtime. Administrators can reused server downtime with monitoring

    utilities that alert when critical thresholds are passed, so that the admin can

    take corrective actions.

  • 7/29/2019 Server Monitoring Document

    21/47

    Detecting internal link errors:

    It is not possible to check the application continuously. A better server

    monitoring, will notify if there are any problems or errors with in the

    application internal links, so that you can resolve those errors before

    customer can find. For example, if there are any dead links or database

    errors.

    21

  • 7/29/2019 Server Monitoring Document

    22/47

    22

    Components of Monitoring:

    Monitoring a server involves the components: Identifying the events (Parameters or Counters) that must be

    monitored

    Determine the event data to capture

    Apply the filters to limit the captured data

    Monitoring (capturing) events

    Saving captured event data

    Analyzing the captured data

    Replaying the captured event data

    Generating the reports

    Server performance is estimated based on the reports andfurther actions should be applied.

  • 7/29/2019 Server Monitoring Document

    23/47

    23

    Identify the parameters to be

    monitored:

    The parameters determine the activities

    that are monitored and captured. These

    parameters depends on what is being

    monitored and why.

    For example, when monitoring disk

    activity, it is not necessary to monitor database

    server locks.

  • 7/29/2019 Server Monitoring Document

    24/47

    24

    Determining the counters data to capture:

    The event data describes each instance of an counter as it

    occurred. For example, when capturing database lock events, it is

    useful to capture data that describes the tables, users and

    connections affected by the lock event.

    1. Apply the filters to limit the counters data collected:- Limit the

    counters data allows the system to focus on the specific types of

    relevant to the monitoring scenario.

    2. Capturing (Monitoring) events:- This is the process of actively

    monitoring the application, to see what is occurring.

    Continue..

  • 7/29/2019 Server Monitoring Document

    25/47

    Determining the counters data to capture:

    3. Save captured data:- This allows data to analyzed at a later time.

    4. Analyzing captured data:- Analyzing event data involves

    determining what is happening and why. Using this analysis, allows to

    make changes that can improve performance.

    Continue..

  • 7/29/2019 Server Monitoring Document

    26/47

    26

    Determining the counters data to capture:

    5. Replaying captured data:- This allows to establish a test copy of the

    server environment from which the captured events as they originally

    occurred on the real system. To determine the effect of the parameters, replay

    allows to analyze the exact events that occur on a production system in test

    environment.

    6. Generating the reports:- Based on the analysis, reports should be

    generated for the future reference

  • 7/29/2019 Server Monitoring Document

    27/47

    27

    Estimate the server performance :

    Based on the reports the server performance should be estimated. The

    estimation moved towards positive results, so that the server is healthy. If

    bring into being poor performance, alterations in server configurations

    should be made to improve the performance.

  • 7/29/2019 Server Monitoring Document

    28/47

    Monitoring Parameters

    orCounters

  • 7/29/2019 Server Monitoring Document

    29/47

    29

    Monitoring Parameters or Counters:Server performance monitoring is a complex subject, it can be

    daunting to met with a choice of over a set of performance counters to

    choose from. Which one are important to monitor. Counters choice

    depends on the role of the system to monitor. Counters determine what

    to monitor and why.

  • 7/29/2019 Server Monitoring Document

    30/47

    Counters to check server availability:

    System up time tells how many seconds it has been since

    server last rebooted.

    Processor (instance) elapsed time - tells how long thatparticular process has been running on your machine.

  • 7/29/2019 Server Monitoring Document

    31/47

    Counters to determine server busy:

    % processor usage time- measures the total utilization of your

    processor by all running processes.

    % processor privileged time tells processor utilization by kernal

    % processor user time - tells processor utilization by user.

    Processor queue length Gives an indication of how many threads

    are waiting for execution.

    Request queued Number of active services and applications

    running on the server.

  • 7/29/2019 Server Monitoring Document

    32/47

    Counters to determine availability of Memory/RAM:

    Memory or pages/sec- indicates the number of paging

    operations to disk during the measuring interval, and this is the

    primary counter to watch for indication of possible insufficient

    RAM to meet your server's needs.

    Memory available bytes if this counter is greater than 10% of

    the actual RAM in your machine then you probably have more

    than enough RAM and don't need to worry.

    Continue..

  • 7/29/2019 Server Monitoring Document

    33/47

    Counters to determine availability of Memory/RAM:

    Processor (instance)\working set - determine which process is

    consuming larger and larger amounts of RAM

    Memory or Transaction fault/sec - measures how often recently

    trimmed page on the standby list are re-referenced. If this counter

    slowly starts to rise over time then it could also indicating server

    reaching a point where you no longer have enough RAM for your

    server to function well.

  • 7/29/2019 Server Monitoring Document

    34/47

    Counters to check hardware:

    System or context switches/sec- measures how frequently the

    processor has to switch from user-mode to kernel-mode to

    handle a request from a thread running in user mode.

    Generally this counter would be higher, but over long term the

    value of this counter should remain fairly constant. If this counter

    suddenly starts increasing however, it may be an indicating of a

    malfunctioning device.

  • 7/29/2019 Server Monitoring Document

    35/47

    Counters to find out disks fast:

    Physical disk transfers/sec states response time of the disk, if it

    goes above 25 disk I/Os per second then you've got poor

    response time for your disk. Physical disk (instance) % idle time - measures the percent time

    that your hard disk is idle during the measurement interval

  • 7/29/2019 Server Monitoring Document

    36/47

    v

    Server Monitoring Tools

  • 7/29/2019 Server Monitoring Document

    37/47

    37

    Tools Provide .

    Managing application service with out impacting the

    infrastructure.

    Resolves problems automatically, such as re-establishing

    network connection or restarting an application.

    Daily scheduled reports are generated automatically.

    Ability to quickly identify hardware and applications issues

    that may cause harm to the operating system.

    Continue..

  • 7/29/2019 Server Monitoring Document

    38/47

    38

    Tools Provide .

    Aides in capacity planning for reconfigurations.

    Instant multimedia alerts by sms, email, phone, instant messenger

    and others.

    24 x 7 monitoring.

    Minimize IT cost

    Help and simplifies detection and resolution of server and network

    problems.

    Reduce downtime and business loss.

    Commercial Monitoring Tools :

  • 7/29/2019 Server Monitoring Document

    39/47

    Commercial Monitoring Tools :

    1. HP Network Node Manager

    2. Observer by Network Instruments

    3. Nimsoft Monitoring Solution

    4. PacketTrap part of Dell

    5. PRTG Network Monitor (free and commercial)

    6. ServersCheck

    7. SolarWinds

    8. SevOne

    9. WhatUpGold

    10. Zyrion Traverse

    Open Source Monitoring Tools :

  • 7/29/2019 Server Monitoring Document

    40/47

    Open Source Monitoring Tools :

    1. Cacti

    2. Nagios

    3. Argus

    4. PandoraFMS

    5. Zenoss

    6. Zabbix

    7. Aggregate Network Manger (limited free)

    8. IsyVmon

    9. NetXms

    10. InterMapper (limited free)

  • 7/29/2019 Server Monitoring Document

    41/47

    41

    Choosing a tool:

    A comprehensive set of tools for monitoring. The choice of the

    tool depends on the events to be monitored, cost of the tool and type

    of the monitoring etc.

    How tool works ?

  • 7/29/2019 Server Monitoring Document

    42/47

    How tool works ?

    Monitoring the server using tool involves:

    Monitoring: IT staff configure to monitor critical IT infrastructure including

    system metrics, network protocols, applications, services, servers etc.

    Alerting: If any infrastructure components fail, providing administrator with

    notice about the failure.

    Response: IT staff can acknowledged alerts and being resolve, other wise

  • 7/29/2019 Server Monitoring Document

    43/47

    the alerts would be send repeatedly.

    Reports: Reports are provide a historical data of failures, events, notifications

    and alerts for later review.

    Maintenance: Scheduled downtime prevents

  • 7/29/2019 Server Monitoring Document

    44/47

    Maintenance: Scheduled downtime prevents.

    Planning: Trending and capacity planning graphs and reports allow you to

    identify necessary infrastructure upgrades before failures occur.

  • 7/29/2019 Server Monitoring Document

    45/47

    Conclusion:

  • 7/29/2019 Server Monitoring Document

    46/47

    ServerMonitoring

    process,analysis

    andreportgenera

    tion

    Action taken againstserver failure based onthe reports

    46

    Monitoring a server is an important aspect, leads to better

    performance and high customer satisfaction. If the problems are

    identified as possible as soon, we can take actions against the failure

    before it impacts the business.

  • 7/29/2019 Server Monitoring Document

    47/47

    47

    Thank you

    Presented By

    N.V.Narasimha Rao