17
Problem Management Version. 3.2 Problem Management Service Desk Incident Management

Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Embed Size (px)

Citation preview

Page 1: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Management

Version. 3.2

ProblemManagement

ServiceDesk

IncidentManagement

Page 2: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problems Versus IncidentsProblems Versus Incidents

IncidentIncident ProblemProblem

Symptom [of a problem]

Each occurrence of the embodiment of a problem

“Complete” when normal operation (from a customer perspective) is restored

Root/underlying cause(s) of an Incident or Incidents

“Complete” when the underlying cause(s) are permanently removed

Long term resolution

“Put the fire out now…”(Firefighting)

“How did this happen?…”(Arson Investigation)

• Incident: Any event not part of standard operation of a service that causes an interruption to or a reduction in the quality of that service.

• Problem: The unknown underlying cause of one or more Incidents. A condition identified from multiple Incidents exhibiting common symptoms, or from a single major Incident, indicative of a single error, for which the cause is unknown.

• Problem Management: Process to minimize the adverse impact of incidents, and to proactively prevent the reoccurrence of incidents, problems, and errors.

Page 3: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

While the incident record is open or after it has been resolved it is reviewed for problem management inclusion. If the incident (or group of related incidents) meet the criteria, a problem record is created to identify cause, solution, and control to prevent re-occurrence.

Incident Analysis ApproachIncident Analysis Approach

?

Critical / High PriorityIncidents with Extensive Impact and potentialTo re-occur

Critical Problem

Independent IncidentsWith significantImpact and possiblesame root cause

High Priority Problem

Multiple moderateincidents possibleSame cause or multipleCauses.

Medium Priority Separated Problems

Data analysis of incidentRecords to determineProcess, system, or Network issues

DetermineProblem ManagementInclusion.

Incidents Criteria Problems

Page 4: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Incident, Problem and Known Error definitions in ITILIncident, Problem and Known Error definitions in ITIL

Common definitions and language are the key factors in communication. That's why ITIL has its own language and set of specific words widely used by all people working in ITIL environment. Below are definitions for the incident, problem and known error. These three phrases are very common for the Incident Management process and it is crucial to understand differences between them.

Incident - an incident is any event that is not part of the standard operation of a service and that causes - or may cause - an interruption to, or a reduction in, the quality of that service. A good example of an incident can be lack of free space on somebody's hard disk.

Problem - a problem is the unknown underlying cause of one or more incidents. One can ask when the incident becomes a problem? The answer is never, the problem ticket is created after the incident has been resolved and when the root cause is not understood.

Known Error - is created from an incident or problem for which the root cause is known and for which a temporary workaround or permanent alternative has been identified. Known error is in place as long as it is permanently fixed by a change. Appropriate Request For Change (RFC) is raised in order to fix the known error. Known Errors are in the scope of responsibility of Problem Management.

Page 5: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Closure

Resolution

Change Needed?

Investigation and Diagnosis

Prioritization

Categorization

Problem Management ProcessProblem Management Process

Problem Logging

Problem Detection

Create Known Error Record

Workaround?

Service DeskService Desk Incident Management

Incident Management

Proactive Problem

Management

Proactive Problem

Management

Event Management

Event Management

Supplier or ContractorSupplier or Contractor

ChangeManagement

ITSMProblem Management

Operations Teamor

Service Provider

Future State

•Outage Notifications•Incident Reports

•Problem Reports

•Change Notifications

Communications

ProblemModuleProblemModule

Known Error

Database

Known Error

Database

SolutionDatabaseSolution

Database

Yes

No

Page 6: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Incident, Problem and Known Error definitions in ITILIncident, Problem and Known Error definitions in ITIL

Common definitions and language are the key factors in communication. That's why ITIL has its own language and set of specific words widely used by all people working in ITIL environment. Below are definitions for the incident, problem and known error. These three phrases are very common for the Incident Management process and it is crucial to understand differences between them.

Incident - an incident is any event that is not part of the standard operation of a service and that causes - or may cause - an interruption to, or a reduction in, the quality of that service. A good example of an incident can be lack of free space on somebody's hard disk.

Problem - a problem is the unknown underlying cause of one or more incidents. One can ask when the incident becomes a problem? The answer is never, the problem ticket is created after the incident has been resolved and when the root cause is not understood.

Known Error - is created from an incident or problem for which the root cause is known and for which a temporary workaround or permanent alternative has been identified. Known error is in place as long as it is permanently fixed by a change. Appropriate Request For Change (RFC) is raised in order to fix the known error. Known Errors are in the scope of responsibility of Problem Management.

Page 7: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

ITSM Problem Management Process OverviewP

rob

lem

Man

age

rP

rob

lem

An

aly

stO

ther

Ser

vic

e S

up

po

rt /

D

eliv

ery

Pro

ces

ses

Req

ue

ster

1Problem

Identification, Recording and Classification

2Review

3Problem

Investigation and Diagnosis

4Problem

Resolution and Closure

Investigation Initiated

Change Management

5Known ErrorIdentification

and Recording

6Known Error Classification

and Assessment

7Known Error

Resolution and Closure

Incident Management

ChangeImplemented

8Knowledge

Identification and Recording

9Knowledge

Validation and Publication

Incident Management

Incident Management

ConfigurationManagement

(Mgmt Reports)

InvestigationInitiated

1Problem

Identification, Recording,

and Classification

Reviewed within 1 Business DayReviewed within 1 Business Day

Root Cause Analysis within 3 Business Days (85%)Root Cause Analysis within 3 Business Days (85%)

Problem Report within 5 Business DaysProblem Report within 5 Business Days

Permanent Solution within 30 DaysPermanent Solution within 30 Days

START

Page 8: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Management Workflow Problem Management minimizes the adverse impact of incidents on the

business and enables root cause analysis to identify a permanent solution.

Problem Detection

Trend analysis is the key to spot the Problems. A trend analysis helps in giving a proactive approach to the Problem Management by which you can avoid the occurrence of the problem earlier rather than resolving the problem at a later stage.

Real time trending as incidents come in can detect a problem early and initiate a resolution before additional customer are impacted.

Problem Investigations should be initiated for all Critical (priority1) and High (priority2) impact incidents where the cause is unknown

Page 9: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Management Workflow - Identification and Classification

Problem LoggingThe Problem logging is critical as all the necessary information and conclusion from the incident has to be captured while creating the Problem. Avoid duplicates by searching for similar existing problems before the creation of a new Problem.

CategorizationProblem categorization is very essential to avoid ambiguities. ITSM helps in applying the incident categorization automatically to a Problem when it is created and this helps in keeping the problem information at the same level of understanding for the analyst.

PrioritizationFocus on the business critical problems based on the problem prioritization. Problem prioritization helps technicians to identify critical problems that need to be addressed. Impact and Urgency associated with a problem decides which problems need to be addressed first.

Page 10: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Management Workflow – Review / Investigation and Diagnosis

Problem Review

Reviewing the problem both in a peer review and with the service area problem manager can help further verify validity of information and initiate root analysis.

Investigation and Diagnosis

• Problem investigation results in getting at the root cause of the problem and initiating actions to resume the failed service. Analyze the impact, root cause and symptoms of the problem to provide a problem resolution.

• When the cause is understood to the point where solution development can begin the problem ticket needs to be updated, the known error or solution record initiated, and ticket moved to “Completed” to reflect that the problem investigation portion is done.

Page 11: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

When is Root Cause Analysis Required?

• “Why” is performed to then determine the “How”.

• Who does RCA – The analyst who resolved the incident initiates the problem investigation including initial Root Cause analysis, and performs a warm handoff (with supporting data) to the next area analyst when required.

Page 12: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Management Workflow - Resolution and Recovery

Workaround and SolutionThe successful diagnosis of a root cause results in changing the problem to a Known-Error and suggesting a workaround.

ITSM helps in categorizing the solutions into three- Known-Error Record, Workaround and Resolution.

Change the problem into a Known-Error record automatically when you add a work-around. Browsing through these known-error records helps in resolving the incidents by themselves and reducing the inflow of incidents.

These also help in lowering the incident resolution time by the incident technicians. The work-around and problem resolutions automatically get added with the solution list.

ClosureProblem closure is very critical as closure confirms that all aspects of the user problems are addressed to the user satisfaction.

Page 13: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Process – ITSM 5 Stages of an InvestigationProcess – ITSM 5 Stages of an Investigation

Identification and Classification This stage initiates the problem management process. The purpose of this stage is to accurately identify and classify the problem.

Investigation and Diagnosis In this stage the analyst attempts to identify the root cause of the problem. You also attempt to find either a permanent solution or a temporary work-around.

Review In this stage, the problem manager validates the impact of the problem and provides guidance to the analyst for the investigation of the problem while it continues or assesses re-assignment.

Resolution and Recovery In this stage, you resolve the problem. The end result of the investigation might be a Known Error record or Solution Database entry.

Closed In this stage, the investigation is closed. No more activities are performed on the problem investigation.

1

2

3

4

5

•Next stage

• Next stage •Cost analysis •Relate incidents •Assign investigation •Generate tasks •Relate to CI •Enter pending (or resume)

• Next stage •Generate known error • Generate work-around / root cause • Generate tasks •Enter pending (or resume)

•Next stage•Generate known error •Generate work-around • Generate solution •Close

•None

Page 14: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Problem Reports

•For Critical/High (P1/P2) incidents

•Requested by ministry (P1/P2/P3/P4)

•Distribution through Problem Management Process Manager internally and SDM’s to affected Ministries

Page 15: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Known Errors

Incidents, problems and known errors

Incidents may match with existing 'Known Problems' (without a known root cause) or 'Known Errors' (with a root cause) under the control of Problem Management and registered in the Known Error Database ( KeDB ) or knowledge base.

Where existing work-arounds have been developed, it is suggested that accessing these will allow the Service Desk to provide a quick first-line fix.

Page 16: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Bundle 1Service Desk

Problem Management StructureProblem Management Structure

ProblemManagement

ProblemManagement ????????

Service Provider Contact Engagement Points

Bundle 2Mainframe

IBM Incident &Problem

Management

IBM Incident &Problem

Management

Krishna [email protected]

Krishna [email protected]

Bundle 3Desktop Management And WorksiteSupport

CompugenIncident & Problem

Management

CompugenIncident & Problem

Management

Paula [email protected]

Paula [email protected]

AcrodexIncident & Problem

Management

AcrodexIncident & Problem

Management

Vincent y [email protected]

Vincent y [email protected]

Non BundledOperations

Service Provisioning

Service Provisioning

Tuan NguyenSandy Stout

Tuan NguyenSandy Stout

Ser

vice

Del

iver

y M

anag

erM

inis

try

Con

tact

Ashish [email protected]

Ashish [email protected] AlbertaService Alberta

Process Management

CISOCISO

ITSMITSM

Christine GunarsonMary Boyle

Christine GunarsonMary Boyle

Alan WokoeckRaymond Viens

Alan WokoeckRaymond Viens

*Weekly Problem Management Review

*Daily Incident Management Touch point

*Weekly Incident Management Review

*Critical Incident investigation

*Critical Incident post mortems

*Incident and Problem Reporting (WOR, MOR)

*Problem Assignment

*Weekly Problem Management Review

*Daily Incident Management Touch point

*Weekly Incident Management Review

*Critical Incident investigation

*Critical Incident post mortems

*Incident and Problem Reporting (WOR, MOR)

*Problem AssignmentG

OA

Cor

pora

te In

cide

nt M

anag

emen

t

Leon

ard

Bla

key

/ Z

ofia

Sae

edi /

Joe

Tka

lcic

/ R

ebec

ca K

uehn

Page 17: Problem Management Version. 3.2 Problem Management Problem Management Service Desk Service Desk Incident Management Incident Management

Some background on Problem Management Process A problem is the cause of one or more incidents. The cause is not always known at the time a problem record is created, and the problem management process is responsible for further investigation.

The key objectives of Problem Management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents, and to minimize the impact of incidents that cannot be prevented.

Problem Management includes diagnosing causes of incidents, determining the resolution, and ensuring that the resolution is implemented. Problem Management also maintains information about problems and the appropriate workarounds and resolutions.

Problems are different than incidents in that for an incident the goal is to resolve the specific issue as quickly as possible for the customer. Incident management is engaged for escalations and overall effectiveness of incident resolution. When multiple incidents seem to be stemming from a common cause, incident management may engage problem management to help document and solicit appropriate owners and resources to investigate and resolve.

Potential Problems are identified and tracked within the ITSM system problem module and reviewed. The problem investigation can be initiated by or assigned to the service providers (for critical and high priority issues they are aware and engaged in cause investigation and resolution plans). The problem ticket is documented through-out the investigation.

Problems are categorized in a similar way to incidents, but the goal is to understand causes, document workarounds and request changes to permanently resolve the problems. Workarounds are documented in a Known Error Database, which improves the efficiency and effectiveness of Incident Management.

In summary, when significant or repeated incidents occur the function is to work with the service providers incident management and problem management. Together perform reviews, root cause investigations, and tracking the solutions and control (does the problem stay resolved). Information on known errors is fed back into the system for knowledgebase article inclusion.

ReviewReview