Service Desk Incident Triage Matrix

Embed Size (px)

Citation preview

  • 8/14/2019 Service Desk Incident Triage Matrix

    1/23

    Incident Management Process:

    24x7 Response and Control

    April 6, 2005

    V1.12

  • 8/14/2019 Service Desk Incident Triage Matrix

    2/23

    Revision History

    Revision History

    Version Date Author Notes

    1.08 23 Feb 2005 Nan McKenna (Initial tracked version)

    1.09 15 Mar 2005 Erik Cummings

    Extract return to work as Appendix C, add proposed

    15/30 minute response times. Add Revision Historypage.

    1.10 22 March 2005 Erik CummingsDifferentiate between Initial PCG IncidentClassification and Final Incident Classification.Added PCG Process Flowchart

    1.11 23 March 2005 Bruce Campbell

    Updated Revision History Table

    From header, removed Draft

    In header, body of document, moved OperationsExcellence to top left margin, placed IncidentManagement Process top, right margin

    Re-applied styles, numbering, and organization

    Added On-Call to Appendix EAdded Management On-Call to Appendix F

    Turned-off numbering in Appendixes E & F

    Re-organized appendixes so that process flowdiagrams were one-after-the-other

    Updated references to the various appendixesthroughout document

    Reworded Section 2.3 a Note

    1.12 04 April 2005 Erik Cummings

    Removed Appendix C (PCG Process)Renumbered Appendix D C and any references toit.Changed Appendix D (now C! - CommunicationsMatrix). Removed Contact Action, 1 st and 2 nd LevelNotification columns. Added Client Comm Intervaland SME Work Started columns.Added new Appendix D Priority and InternalResponse Time CommitmentsAdded new definitions Priority, Impact, Urgency

    Table 1 Revision History

    8/8/2008 v1.12 Page ii

  • 8/14/2019 Service Desk Incident Triage Matrix

    3/23

  • 8/14/2019 Service Desk Incident Triage Matrix

    4/23

  • 8/14/2019 Service Desk Incident Triage Matrix

    5/23

  • 8/14/2019 Service Desk Incident Triage Matrix

    6/23

    List of Tables

    List of Tables

    Table 1 Revision History................................................................................................ ............... ............ii

    Table 2 Detailed Incident Control Process ...................................................................................... ......11

    Table 3 Explanation of High-Level Incident Management Process Flow............................... .............13

    Table 4 Incident Level Classification Matrix........................................................................ ................ ..17

    Table 5 Return-To-Work Guidelines...................................................................................................... ..21

    8/8/2008 v1.12 Page iv

  • 8/14/2019 Service Desk Incident Triage Matrix

    7/23

    Operations Excellence Incident Management Process

    1.0 Executive Summary1.1Document Contents

    1.a. This document contains processes, through the use of which the newProduction Control Group will be able to quickly and efficiently respond to,

    manage, and resolve incidents. Documentation includes on-calldefinitions and guidelines, escalation processes, process flow diagrams,and data tables, sets general expectations, defines roles andresponsibilities, and provides general guidelines.

    1.2Intended Audience

    1.a. This document is directed at and intended for executive level andmanagement personnel, ITSS personnel, including all of those areincluded in this process, such as: Subject Matter Experts (SMEs)Technical Leads, Line Managers, Systems Administrators, DBAs, projectleaders, and facilities personnel.

    8/8/2008 v1.12 Page 5 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    8/23

    Operations Excellence Incident Management Process

    2.0 BackgroundIt is expected that most services supported by ITSS are available 24x7. As a result of this expectation, it is in the best interest of ITSS Shared Services workgroups and ITSSas a whole to develop and establish a combined staff the Production Control Group(PCG) dedicated to proactively managing and responding to events as they occur.

    Eventually, the role of the PCG will include incident evaluation, and depending on theseverity of the event, escalate to upper management. In some situations, the moreexperienced level technical personnel will take action to effect repairs and/or restoreservices.As the PCG acquires experience, and as ITSS adds monitoring and troubleshootingcapability, they will assume additional incident response responsibilities.2.1Primary Responsibilities of the Production Control Group

    1.a. Managing and controlling a widespread service outage, including incidentreporting and escalation.

    2.2Incident reporting and escalation techniques will:

    1.a. Specify a point-of-contract (owner) for all issues and ensure that servicesare restored through the prudent use of departmental resources, includingdocumentation of the incident from beginning to its resolution.

    1.b. Effectively manage the communication of information within ITSS whenthere are issues that actually or potentially impact ITSS-supportedservices or facilities.

    1.c. Pro-actively respond to issues that impact ITSS-supported services andfacilities; evaluate, classify, escalate, and manage service restorationefforts efficiently and as expeditiously as possible, up through incidentresolution.

    2.3Additional Responsibilities of the Production Control Group

    1.a. Note: It is anticipated that any single-shift of the PCG will NOT beconsumed by continuously resolving issues. Because of this,supplemental duties and tasks, detailed below, will be assigned.

    1 Assist offsite Subject Matter Experts by performing requested tasks,such as visual inspections of hardware and recycling the power onequipment as instructed.

    2 Manage and prepare magnetic media for rotation, offsite shipmentand storage, including organizing and filing transmittal logs.

    3 Control building and facility access, escort vendors to restricted areasfor the purposes of inspection, maintenance, and repair of equipment.

    4 Monitor building/facility/ data center environmentals, such as: air conditioning, fire suppression system, lighting, and so on, log timesand results of the monitoring activity.

    5 After normal working hours, perform 1st tier triage of reported issues,classify and escalate as necessary.

    6 Receive and log calls from end users, and generate Remedy tickets,escalate as necessary.

    7 Set up Video/Telephone conferences.

    8/8/2008 v1.12 Page 6 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    9/23

    Operations Excellence Incident Management Process

    8 Accept and sign for emergency delivery of replacement parts fromvendors.

    9 Perform other tasks deemed necessary by department supervision.

    3.0 Roles and Definitions

    Account Manager A member of the ITSS Account Management team in ClientSupport who is responsible for the relationship with one or several key clients (e.g.GSB, H&S, Libraries)

    Client A primary paying customer of ITSS services and support End User Person who directly uses a service. An end user could be an internal or

    external to ITSS. End users are directly impacted during an outage, and generallyhave an established relationship with the Client or Service Owner

    Impact Level of effect or impact on the Stanford Campus. This is relative to theCampus as a whole, not specifically to the client. (Values= Campus-Wide, Major School or Dept wide, Minor Group or Single User, and Non-Service Affecting)

    Incident Manager The Shared Services Line Manager who is designated as

    responsible for a specific incident Incident/Event/Problem/Issue For the purposes of this document, these terms

    are intended to mean a failure of any component of any system or service, and areused interchangeably throughout this document

    ITSS Client Support Group which does client relations, account management,functional analysis, sales & marketing, documentation, software licensing, end user training, and Help Desk and CRC support

    ITSS Engineering and Projects Group which does technology R&D, serviceenhancements, new product and service projects

    ITSS Shared Services Group which does operations ITSS Strategic Planning Includes technology strategy & architecture and finance

    groups Line Manager Workgroup managers in ITSS Shared Services On-Call Subject Matter Expert (SME) SME (see below) who is designated to be

    available to respond to reported outages, triage the incident, perform the neededtasks to restore services, assist other workgroups in the restoration process, or determine which other members within their own workgroup are needed to assist inservice restoration

    Operations Owner The ITSS staff person who has the ultimate authority for aservice including its functionality and approval for any changes to the service

    Priority Level of response and effort directed towards resolving an incident. It isdetermined by the inherent service level commitment of the service, as well as acombination of Urgency and Impact. Priority is sometime referred to as severity.(Values = Urgent, High, Medium, Low)

    Product Manager Own product quality and client satisfaction for a service Production Control Group (PCG) Group which will perform monitoring and basic

    problem determination and evaluation, escalation, communication and in somecases, incident resolution

    Subject Matter Expert (SME) Any technical ITSS staff person whose job requiresextensive technical knowledge of network and service components and their related

    8/8/2008 v1.12 Page 7 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    10/23

    Operations Excellence Incident Management Process

    requirements. SMEs are considered experts and possess a detailed knowledge of service functionality, restoration, component/service repair.

    Satellite Operations Center (SOC) The SOC is a partner with the UniversityEmergency Operations Center (EOC) during Level 2 (major building fire, extendedpower outage) or Level 3 (major earthquake or extensive flooding) emergencies.The ITSS SOC team provides real-time field information to the EOC as well ascoordinating and directing emergency responses.

    Urgency End user or clients assessment of the importance and/or urgency of theissue as it affects their ability to perform their work. This value is provided by thecustomer. (Values = Urgent, High, Medium, Low)

    8/8/2008 v1.12 Page 8 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    11/23

    Operations Excellence Incident Management Process

    4.0 Process Review4.1Process Outline

    1.a. Note; There are six major steps in this process, from the time of incidentdetection through root cause analysis and implementing preventative

    measures.4.2Incident Detection and Reporting

    1.a. An incident can be detected by:

    1 From an end-user

    2 From a client

    3 From an SME

    4 From automated monitoring

    1.b. It is important that the sharing of information occur between and amonggroups.

    1.c. The process of reporting of problems is different between normalworking hours, 8:00 A.M. to 5:00 P.M., M-F, and after those hours.

    4.3Incident Level Classification: See Appendices C and D

    1.a. This includes assigning a severity level to the incident, and its subsequententry into the Remedy incident tracking system.

    4.4Incident Notification

    1.a. This includes notification to an ITSS Incident Manager and clients, andincludes outage information posted on the SU Web site, Cable TV,informational messages left on the designated voice mail box, and emailsent to designated personnel and other client notification as deemedappropriate.

    4.5Incident Escalation

    1.a. This includes escalation to the ITSS Incident Manager, and anysubsequent escalation calls deemed necessary. Note that the severitylevel will dictate who in the management chain of command to contact,and when to provide them status reports. Additionally, the PCG willdetermine whether or not the incident needs to be escalated to the SOC.

    4.6Incident Resolution

    1.a. This covers work performed during the incident itself, with responsibilitiesas follows:

    1.b. The Incident Manager is responsible and accountable for the overallrecovery effort, performing the following functions:

    1 Establishing recovery priorities

    2 Coordinating and delegating responsibilities as they relate to therecovery effort.

    3 Issuing requests for additional resources

    8/8/2008 v1.12 Page 9 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    12/23

    Operations Excellence Incident Management Process

    4 Ensuring the participation of critical internal and external supportgroups and vendors, such as the recall of media from the off-sitestorage vendor, or the purchase of replacement parts and equipment

    5 Reviewing and approving tactical plans

    6 Communicating incident status to ITSS management/executives asneeded

    7 Working with Client Support to approve and authorize the release of information to other schools and departments

    1.c. SMEs and Line Managers are responsible for analyzing technicalproblems and making technical decisions, implementing tactical plans,and communicating to other SMEs as well as the Incident Manager.

    1.d. The PCG is responsible for coordination of the incident resolution effortand for communication as deemed necessary.

    4.7Post-Incident Activities

    1.a. This covers the activities after the incident is resolved.

    1 The first task is to ensure that any post-incident cleanup is completed

    2 Perform root cause analysis of the incident,

    3 To avoid similar, future incidents, determine what processimprovements and preventative measures that can be put into place.

    4 Implement changes in process or technical support as appropriate.

    5 Ensure that PCG receives feedback and input from the user community,

    6 Perform client follow-up and ensure that an incident response qualitysurvey form is available for end-user and client feedback.

    8/8/2008 v1.12 Page 10 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    13/23

    Operations Excellence Incident Management Process

    5.0 Detailed Incident Control Process5.1Detailed Process Flow Explanation Table. Reference Appendix A

    Process # Process Name Detailed Description Action ByIncident Detection and Reporting

    1 Problem Reporting:End Users

    End-users will call 5-HELP or use the web athttp://helpsu.stanford.edu/ . Telephone calls aredirected to the ITSS Help Desk where the problem isevaluatedIf the Help Desk (any tier) determines that this isan urgent incident, the call/ticket should bedirectly escalated to the PCG

    End-User

    2 Problem Reporting:Clients

    In most cases, clients should call 5-HELP or use theweb at http://helpsu.stanford.edu/ . In some specialcases, clients may have direct access to the PCG for reporting problems and receiving updates. In this case,skip to step 12.

    Client

    3

    Problem Reporting:

    End Users After Hours

    If an end-user calls 5-HELP after hours, the user will getthe recorded phone tree. Users can choose to getthrough to the PCG directly, or leave a recorded

    message. For after hours calls, the PCG will determinewhether call is urgent. If the issue is not urgent, thePCG will enter a ticket in Remedy for review thefollowing business day.

    End User, PCG

    4 Problem Reporting:Monitoring to SMEs

    In some cases, monitoring may notify a SME or aproblem before a user, client or the PCG. If the issue isurgent, escalate directly to the PCG for coordinationand entry into Remedy.

    SME

    5 Problem Reporting:Monitoring to PCG Monitoring reports information directly to PCG PCG

    6 Resolve? Help Desk assesses whether the ticket can be resolvedat this point. If so, the Help Desk will resolve and close. Help Desk

    7 Urgent?If the ticket cannot be resolved, Help Desk to determinewhether the ticket should be forwarded to SME/HelpDesk Tier 2 or to the PCG

    Help Desk

    8 Forward To SME If the case does not appear to be severity Urgent/High,forward to SME Help Desk

    9 Resolve Quickly? Can the case be resolved by the SME and is it SeverityLevel Medium/Low? SME

    10 Enter Solution InRemedyIf the SME can quickly resolve the case, enter solutionin Remedy and close ticket. SME

    11 Forward To PCGIf the SME determines that there is impact beyond asimple fix and the Severity Level is Urgent/High, notifythe PCG.

    SME/PCG

    Classification

    12Assign SeverityLevel

    Assign a severity level to the incident; using thestandard ITSS categories (see Appendix C and D). Theseverity levels govern:Level of action to be taken by the Production Control

    GroupNotification and escalation guidelinesTime intervals in which to provide status reportsTime intervals in which to initiate escalation andmanagement decision processes

    PCG

    13 Enter In Remedy Enter a ticket for the incident into the Remedy HelpDesk application. PCG

    Table 2 Detailed Incident Control Process

    8/8/2008 v1.12 Page 11 of 21

    http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/
  • 8/14/2019 Service Desk Incident Triage Matrix

    14/23

    Operations Excellence Incident Management Process

    6.0 High-Level Incident Process Explanation6.1Detailed Process Explanation: See Appendix B

    NotificationSME Notify appropriate SME(s) if necessary, using AMCOM on-call system PCGUpdate itss-service-alerts@lists

    Send a message to [email protected] PCG

    Post Messages To Web,Phone, TV

    Message information will include: the date and time, a brief description of the problem, and if available, the estimated time of resolution/restoration.

    Web: Update status on down.stanford.edu

    Telephone: In the event of a major network failure, update the designatedvoicemail box: 7-DOWN

    SU Cable TV ITSS can have pre-worded messages set for broadcast,where the group can just fill in the blanks.

    PCG

    Escalation

    Notify Line Manager Contact the Shared Services Line Manager of the affected system. If aLine Manager is unavailable, use the AMCOM system to determine thebackup.

    PCG

    Determine IncidentManager

    If the incident falls into the area of a single Line Manager, that LineManager will contact the Incident Manager. If multiple Line Managers areinvolved, they must determine a single Incident Manager.

    SharedServicesLineManagers

    Send Email

    Send first email to appropriate lists/clients, based on Service LevelAgreements. Use the [email protected] list for campus-wide outages; the Incident Manager should approve any messages whichgo to this list.

    PCG,IncidentManager

    Escalate To Senior Management

    The Severity Level (see Appendix C and D) will determine the escalationto management PCG

    Resolution

    Incident Management

    The Incident Manager will take ownership of the problem and manage the

    incident. Responsibilities:Establish priorities

    Coordinate and delegate responsibilities in regards to the recovery effort

    Request additional internal or external resources

    Ensure and manage the participation of critical internal and externalsupport groups and vendors

    Review and approve tactical plans

    Communicate incident status to ITSS management/executives as needed

    Work with Client Support to release information as needed to clients/users

    across campusResolve Incident SMEs are responsible for analyzing technical problems, implementingtactical plans, and communicating to other SMEs and with the PCG. SMEs

    8/8/2008 v1.12 Page 12 of 21

    mailto:itss-service-alerts@listsmailto:[email protected]:[email protected]:[email protected]:itss-service-alerts@listsmailto:[email protected]
  • 8/14/2019 Service Desk Incident Triage Matrix

    15/23

    Operations Excellence Incident Management Process

    Post ResolutionInformation To Web,Phone, TV

    Message information will include: the date and time, a brief description of the problem, and if available, the estimated time of resolution/restoration.

    Web: Update status on down.stanford.edu

    Telephone: In the event of a major network failure, update the designatedvoicemail box: 7-DOWN

    SU Cable TV ITSS can have pre-worded messages set for broadcast,where the group can just fill in the blanks

    PCG

    Post Incident Analysis

    Complete Cleanup Tasks Determine whether cleanup is required, and identify who will own andperform the additional clean-up tasks SME, PCG

    Root Cause Analysis

    It is the responsibility of the manager of the PCG to initiate root causeanalysis, collecting as much information as possible, and to ensure thatany information which will help in resolving future incidents is entered intothe related Remedy ticket for future use.

    PCGManager

    Incident Prevention Determine processes which can be implemented to prevent a repeat of the incident.

    SharedServicesManagers,SMEs

    Client/User Follow-upEnsure selected members of the recovery team make follow up calls tothe affected users, to solicit their constructive comments. Share results of the analysis with workgroups and clients where appropriate.

    PCG

    Quality SurveyITSS will make an on-line survey available for user/client feedback, andfor ITSS staff. The PCG is responsible for tallying survey results andmaking them available to the appropriate ITSS staff and managers.

    PCG

    Table 3 Explanation of High-Level Incident Management Process Flow

    8/8/2008 v1.12 Page 13 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    16/23

    Operations Excellence Incident Management Process

    7.0 Outstanding Issues7.1A common paging system is required

    1.a. AMCOM for manual paging

    1.b. What to use for automated paging from monitoring systems?

    7.2Definition of Service Hours

    7.3Definition of availability, outage, and service degradation

    7.4Service-level procedures for client notification

    8/8/2008 v1.12 Page 14 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    17/23

    Incident Detection & Reporting

    PCGClient Help DeskAutomatedMonitoringSMEEnd User

    Report Problem:HelpSU/5HELP

    Report Problem:HelpSU/5HELP

    Resolve?

    Report Problem

    Urgent?

    NoForward To SME

    (Help DeskTier 2) For AdditionalAnalysis

    No

    Forward DirectlyTo PCG

    ResolveQuickly?

    Enter Solution In

    Remedy

    Yes

    No

    1 1 4

    Calls5-HELP After

    Hours

    Calls5-HELP After

    Hours

    3 3

    7

    6

    8

    9

    10

    11

    Report Problem

    5

    Enter IncidentTicket InRemedy

    Report Problem:Directly To PCG

    Yes

    2

    DetermineSeverity

    Level

    12

    13

    Operations Excellence Incident Management Process

    Appendix A Incident Management Process FlowchartReference Table 1 Detailed Incident Control Process

    1.a. Note that the circle numbers in the flowchart correspond to the numberson table 2, page 10.

    Figure 1 Incident Detection and Reporting

    8/8/2008 v1.12 Page 15 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    18/23

    Operations Excellence Incident Management Process

    Appendix B High-Level Incident Management Process Flow

    Figure 2 High-Level Incident Management Process Flow

    8/8/2008 v1.12 Page 16 of 21

    ProductionControl Group

    Subject Matter Expert

    Monitoring

    Line Manager

    [email protected]

    7-DOWN End User

    Client

    System Status

    End User Client

    HelpSU/5-HELP

    Help DeskTier 1

    DetectionReporting

    Classification

    N o

    t i f y

    U p

    d a

    t e

    Notification

    Escalation

    Resolution

    Post Incident Activities Production

    Control GroupSME

    ProductionControl Group

    U p d a t e

    Self-Service

    Classify Incident Level & E nter in Remed y

    AccountManager

    SME

    C o m m u n i c a t e

    RemedyDatabase

    Communicate

    ProductionControl Group

    Duty Manager

    U p d a t e W i t h S o l u t i o n

    U p d a t e

    E m e r g

    e n c y

    RemedyDatabase

    Up d a te

    PCG Manager

    U p d a t e w i t h S o l u t i o n I n f o r m a t o n

    Classify Incident Level & Enter i n Remedy

    C o m m un ica te

    C o m m u n i c a t e

    Line Manager

    Duty Manager

    RemedyDatabase

    SOC/EOC

    Liaison

  • 8/14/2019 Service Desk Incident Triage Matrix

    19/23

    Operations Excellence Incident Management Process

    Appendix C Incident Level Communications MatrixLevel Description Incident Examples Client UpdateInterval

    SME WorkStarted w/in:*

    Urgent

    A major service outagewith significant and

    immediate businessimpact and noworkaround.

    Large number of users

    Outage of significant length

    No availableworkaround

    Mission/ businesscritical

    Fire suppression system activation indata center

    Loss of electrical power Entire network switch, closet and/or building outagesFailure of 1 or more high priorityservices e.g. Exchange, OracleFinancials, HRMS, PeopleSoftLarge denial of service attacks/;successful hacking; loss or altering of data; theft of data, simultaneous virusinfections

    SU telephony systems

    Initial Immediate.

    Notification on-going:

    hour

    30 minutes

    High

    A major service outageor degradation with

    significant businessimpact and anunsustainableworkaround.

    Multiple users Work performance

    reduced Mission/ business

    critical

    Failure of Storage system (storage areanetwork SAN)Failure of a server of a sensitive clientor user

    Severely degraded performance

    Smaller denial of service attacks

    Initial Immediate.

    Notification on-going:

    1 hour

    1hour

    Medium

    A service outage or degradation with anacceptable workaround.

    Service-affecting Minimal

    performancedegradation

    Affects non-criticalbusiness function

    Cannot connect to the internet, send or receive email

    Hardware failure, cannot access data,cannot print

    Degraded performance

    As applicable. By

    SME working issue.4 business hours

    Low

    Non service-affecting. Cosmetic problem System

    enhancement

    Previously requested enhancements toa system

    Upon issue resolutionor as applicable with.By SME workingissue.

    1 business day

    Table 4 Incident Level Classification Matrix

    * Note: This column indicates the most amount of time that will transpire before a technician beginsworking on an Incident. Times will generally be much faster for all severities.

    8/8/2008 v1.12 Page 17 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    20/23

    Operations Excellence Incident Management Process

    Appendix D Priorities and Internal Response Times

    Note: The following table refers to Priority, not to Urgency or Impact. Priority is a combination of the combined Urgency, Impact, and existing Service Level Commitments for the service in question. Thisis an important concept to adhere to Urgency is offered by the customer, Priority is assigned by theHelpdesk, PCG, and/or SME involved from a system-wide perspective.

    Usage: These Priority levels (and the associated Urgency and Impact values) are used to trackincidents as they are reported and worked on. Each of Priority, Urgency, and Impact relate directly toRemedy ticket fields.

    8/8/2008 v1.12 Page 18 of 21

    Priority DescriptionCommitted

    ServiceHours

    PCG CallInitiate

    SME CallResponse

    EscalationInterval

    SME WorkStarted

    Urgent

    A major service outagewith significant andimmediate business

    impact and noworkaround.

    Large number of users

    Outage of significant length

    No availableworkaround

    Mission/ businesscritical

    24x7 Immediate 15 Minutes 10 minutes 30 minutes

    High

    A major service outageor degradation withsignificant businessimpact and an

    unsustainableworkaround. Multiple users Work performance

    reduced Mission/ business

    critical

    24x7 Immediate 15 Minutes 10 Minutes 1 hour

    Medium

    A service outage or degradation with anacceptable workaround.

    Service-affecting Minimal

    performancedegradation

    Affects non-criticalbusiness function

    8-5, M-FTicket

    Assignment/eMail

    Asappropriate

    (workbegins, workupdate, workcompleted)

    StandardSME Group

    Remedysettings

    4 businesshours

    Low

    Non service-affecting. Cosmetic problem System

    enhancement

    8-5, M-FTicket

    Assignment/eMail

    Asappropriate

    (workbegins,

    informationrequired,

    workcompleted)

    StandardSME Group

    Remedysettings

    1 businessday

  • 8/14/2019 Service Desk Incident Triage Matrix

    21/23

    Operations Excellence Incident Management Process

    Appendix E On-Call GuidelinesGuideline Purpose

    To generally define and standardize:

    On-call duties and responsibilities

    A methodology for communications andengagement of problem determination andresolution

    On-call scheduling

    Response expectations/guidelines and generalescalation processes in the event 24 X7 on-sitegroup is engaged in an on-going event or incident.

    System generated notifications will continue to be handled within the requiredtime frames by the individual SME groups.

    DutiesRequirements for on-call responsibility must be identified inthe appropriate job descriptions, including: carrying apager, cell phone, availability of the employees homephone number, and email.

    Responsibilities

    Share on-call responsibilities with other members of thework group

    Begin working on the event as soon as notified

    This may require working from home or traveling to work. The decision to make aphysical appearance at work depends on thecircumstances of the event, such as:swapping hardware components or, an on-siteappearance by a vendor.

    Communications

    Teleconference Phone Bridge Telecom will have ateleconference number available to technical personnel,and the PCG. This will be used when the expertise of multiple SMEs is required to resolve an incident. It willalso permit the technical staff the capability tocommunicate as a group. Additionally, first-hand, the PCGwill be able to determine the status of the incident andkeep management informed without them actually beinginvolved in the conference call.

    The AMCOM system will be the primary contactinformation/procedures lookup and paging tool for the 24 X7 on-site groups.

    Staff will provide and track individual work group on-callschedules.

    8/8/2008 v1.12 Page 19 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    22/23

    Operations Excellence Incident Management Process

    The work group establishes the rotation.

    Members of the work groups are responsible for maintaining and keeping current, the contact and coverageinformation on the on-call database.

    Communications Elements

    Required communications devices: pager or cell phone,personal phone.

    Additional communications devices as recommended bythe SME groups: DSL, Treo, wireless-laptop, email.

    Notification Protocol

    Initial outgoing page

    Re-page in 10 minutes

    If a call-back is NOT received from the designated on-callSME within 15-minutes, begin escalation to the next on-call

    person, including re-contacting the primary on-call personand the on-call Shared Services manager on allsubsequent pages.

    Recipient to confirm garbled pages, follow call-backprotocol.

    Initial Communications Tracking

    Use AMCOM system for initial communications tracking

    Response Protocol

    15 minute call-back

    Within 30 minutes, be actively engaged in problemdetermination and resolution

    Actively engaged via:

    Home system

    Wireless laptop

    On-site

    SME groups may establish accelerated response profilesbased upon their response criticality

    Scheduling

    By SME group designSME schedule to be established and published in AMCOMsystem

    SME contact instructions to be included

    8/8/2008 v1.12 Page 20 of 21

  • 8/14/2019 Service Desk Incident Triage Matrix

    23/23

    Operations Excellence Incident Management Process

    Appendix F Management On-Call GuidelinesReturn-To-Work Guidelines

    These guidelines are for Management to consider if extended hours have been worked due to outage/issue by

    an on-call representative.These guidelines should be used to ensure there is alwaysan effective on-call representative, while protecting the on-call SME from overly extensive work-time.

    If the primary on-call SME has already worked consecutiveextended hours, or multiple shifts, and a new event hasoccurred:

    Either the manager will provide a backup andnotify the backup of their modified on-callstatus, or the entire group of SMEs will make adecision on the selection of an alternate SME to

    be used in this situation.To allow staff members who are involved with an after hour call-out on Sunday through Thursday to obtain adequaterest, the following is provided as a sample set of guidelinesfor a return-to-work policy:

    On-Call SME works until Report to work no later than0200 11000300 12000400 13000500 Take rest of day off

    Table 5 Return-To-Work Guidelines