Upload
austin-allen
View
246
Download
8
Tags:
Embed Size (px)
Citation preview
Problem Management
Version. 3.2
ProblemManagement
ServiceDesk
IncidentManagement
Problems Versus IncidentsProblems Versus Incidents
IncidentIncident ProblemProblem
Symptom [of a problem]
Each occurrence of the embodiment of a problem
“Complete” when normal operation (from a customer perspective) is restored
Root/underlying cause(s) of an Incident or Incidents
“Complete” when the underlying cause(s) are permanently removed
Long term resolution
“Put the fire out now…”(Firefighting)
“How did this happen?…”(Arson Investigation)
• Incident: Any event not part of standard operation of a service that causes an interruption to or a reduction in the quality of that service.
• Problem: The unknown underlying cause of one or more Incidents. A condition identified from multiple Incidents exhibiting common symptoms, or from a single major Incident, indicative of a single error, for which the cause is unknown.
• Problem Management: Process to minimize the adverse impact of incidents, and to proactively prevent the reoccurrence of incidents, problems, and errors.
While the incident record is open or after it has been resolved it is reviewed for problem management inclusion. If the incident (or group of related incidents) meet the criteria, a problem record is created to identify cause, solution, and control to prevent re-occurrence.
Incident Analysis ApproachIncident Analysis Approach
?
Critical / High PriorityIncidents with Extensive Impact and potentialTo re-occur
Critical Problem
Independent IncidentsWith significantImpact and possiblesame root cause
High Priority Problem
Multiple moderateincidents possibleSame cause or multipleCauses.
Medium Priority Separated Problems
Data analysis of incidentRecords to determineProcess, system, or Network issues
DetermineProblem ManagementInclusion.
Incidents Criteria Problems
Incident, Problem and Known Error definitions in ITILIncident, Problem and Known Error definitions in ITIL
Common definitions and language are the key factors in communication. That's why ITIL has its own language and set of specific words widely used by all people working in ITIL environment. Below are definitions for the incident, problem and known error. These three phrases are very common for the Incident Management process and it is crucial to understand differences between them.
Incident - an incident is any event that is not part of the standard operation of a service and that causes - or may cause - an interruption to, or a reduction in, the quality of that service. A good example of an incident can be lack of free space on somebody's hard disk.
Problem - a problem is the unknown underlying cause of one or more incidents. One can ask when the incident becomes a problem? The answer is never, the problem ticket is created after the incident has been resolved and when the root cause is not understood.
Known Error - is created from an incident or problem for which the root cause is known and for which a temporary workaround or permanent alternative has been identified. Known error is in place as long as it is permanently fixed by a change. Appropriate Request For Change (RFC) is raised in order to fix the known error. Known Errors are in the scope of responsibility of Problem Management.
Closure
Resolution
Change Needed?
Investigation and Diagnosis
Prioritization
Categorization
Problem Management ProcessProblem Management Process
Problem Logging
Problem Detection
Create Known Error Record
Workaround?
Service DeskService Desk Incident Management
Incident Management
Proactive Problem
Management
Proactive Problem
Management
Event Management
Event Management
Supplier or ContractorSupplier or Contractor
ChangeManagement
ITSMProblem Management
Operations Teamor
Service Provider
Future State
•Outage Notifications•Incident Reports
•Problem Reports
•Change Notifications
Communications
ProblemModuleProblemModule
Known Error
Database
Known Error
Database
SolutionDatabaseSolution
Database
Yes
No
Incident, Problem and Known Error definitions in ITILIncident, Problem and Known Error definitions in ITIL
Common definitions and language are the key factors in communication. That's why ITIL has its own language and set of specific words widely used by all people working in ITIL environment. Below are definitions for the incident, problem and known error. These three phrases are very common for the Incident Management process and it is crucial to understand differences between them.
Incident - an incident is any event that is not part of the standard operation of a service and that causes - or may cause - an interruption to, or a reduction in, the quality of that service. A good example of an incident can be lack of free space on somebody's hard disk.
Problem - a problem is the unknown underlying cause of one or more incidents. One can ask when the incident becomes a problem? The answer is never, the problem ticket is created after the incident has been resolved and when the root cause is not understood.
Known Error - is created from an incident or problem for which the root cause is known and for which a temporary workaround or permanent alternative has been identified. Known error is in place as long as it is permanently fixed by a change. Appropriate Request For Change (RFC) is raised in order to fix the known error. Known Errors are in the scope of responsibility of Problem Management.
ITSM Problem Management Process OverviewP
rob
lem
Man
age
rP
rob
lem
An
aly
stO
ther
Ser
vic
e S
up
po
rt /
D
eliv
ery
Pro
ces
ses
Req
ue
ster
1Problem
Identification, Recording and Classification
2Review
3Problem
Investigation and Diagnosis
4Problem
Resolution and Closure
Investigation Initiated
Change Management
5Known ErrorIdentification
and Recording
6Known Error Classification
and Assessment
7Known Error
Resolution and Closure
Incident Management
ChangeImplemented
8Knowledge
Identification and Recording
9Knowledge
Validation and Publication
Incident Management
Incident Management
ConfigurationManagement
(Mgmt Reports)
InvestigationInitiated
1Problem
Identification, Recording,
and Classification
Reviewed within 1 Business DayReviewed within 1 Business Day
Root Cause Analysis within 3 Business Days (85%)Root Cause Analysis within 3 Business Days (85%)
Problem Report within 5 Business DaysProblem Report within 5 Business Days
Permanent Solution within 30 DaysPermanent Solution within 30 Days
START
Problem Management Workflow Problem Management minimizes the adverse impact of incidents on the
business and enables root cause analysis to identify a permanent solution.
Problem Detection
Trend analysis is the key to spot the Problems. A trend analysis helps in giving a proactive approach to the Problem Management by which you can avoid the occurrence of the problem earlier rather than resolving the problem at a later stage.
Real time trending as incidents come in can detect a problem early and initiate a resolution before additional customer are impacted.
Problem Investigations should be initiated for all Critical (priority1) and High (priority2) impact incidents where the cause is unknown
Problem Management Workflow - Identification and Classification
Problem LoggingThe Problem logging is critical as all the necessary information and conclusion from the incident has to be captured while creating the Problem. Avoid duplicates by searching for similar existing problems before the creation of a new Problem.
CategorizationProblem categorization is very essential to avoid ambiguities. ITSM helps in applying the incident categorization automatically to a Problem when it is created and this helps in keeping the problem information at the same level of understanding for the analyst.
PrioritizationFocus on the business critical problems based on the problem prioritization. Problem prioritization helps technicians to identify critical problems that need to be addressed. Impact and Urgency associated with a problem decides which problems need to be addressed first.
Problem Management Workflow – Review / Investigation and Diagnosis
Problem Review
Reviewing the problem both in a peer review and with the service area problem manager can help further verify validity of information and initiate root analysis.
Investigation and Diagnosis
• Problem investigation results in getting at the root cause of the problem and initiating actions to resume the failed service. Analyze the impact, root cause and symptoms of the problem to provide a problem resolution.
• When the cause is understood to the point where solution development can begin the problem ticket needs to be updated, the known error or solution record initiated, and ticket moved to “Completed” to reflect that the problem investigation portion is done.
When is Root Cause Analysis Required?
• “Why” is performed to then determine the “How”.
• Who does RCA – The analyst who resolved the incident initiates the problem investigation including initial Root Cause analysis, and performs a warm handoff (with supporting data) to the next area analyst when required.
Problem Management Workflow - Resolution and Recovery
Workaround and SolutionThe successful diagnosis of a root cause results in changing the problem to a Known-Error and suggesting a workaround.
ITSM helps in categorizing the solutions into three- Known-Error Record, Workaround and Resolution.
Change the problem into a Known-Error record automatically when you add a work-around. Browsing through these known-error records helps in resolving the incidents by themselves and reducing the inflow of incidents.
These also help in lowering the incident resolution time by the incident technicians. The work-around and problem resolutions automatically get added with the solution list.
ClosureProblem closure is very critical as closure confirms that all aspects of the user problems are addressed to the user satisfaction.
Process – ITSM 5 Stages of an InvestigationProcess – ITSM 5 Stages of an Investigation
Identification and Classification This stage initiates the problem management process. The purpose of this stage is to accurately identify and classify the problem.
Investigation and Diagnosis In this stage the analyst attempts to identify the root cause of the problem. You also attempt to find either a permanent solution or a temporary work-around.
Review In this stage, the problem manager validates the impact of the problem and provides guidance to the analyst for the investigation of the problem while it continues or assesses re-assignment.
Resolution and Recovery In this stage, you resolve the problem. The end result of the investigation might be a Known Error record or Solution Database entry.
Closed In this stage, the investigation is closed. No more activities are performed on the problem investigation.
1
2
3
4
5
•Next stage
• Next stage •Cost analysis •Relate incidents •Assign investigation •Generate tasks •Relate to CI •Enter pending (or resume)
• Next stage •Generate known error • Generate work-around / root cause • Generate tasks •Enter pending (or resume)
•Next stage•Generate known error •Generate work-around • Generate solution •Close
•None
Problem Reports
•For Critical/High (P1/P2) incidents
•Requested by ministry (P1/P2/P3/P4)
•Distribution through Problem Management Process Manager internally and SDM’s to affected Ministries
Known Errors
Incidents, problems and known errors
Incidents may match with existing 'Known Problems' (without a known root cause) or 'Known Errors' (with a root cause) under the control of Problem Management and registered in the Known Error Database ( KeDB ) or knowledge base.
Where existing work-arounds have been developed, it is suggested that accessing these will allow the Service Desk to provide a quick first-line fix.
Bundle 1Service Desk
Problem Management StructureProblem Management Structure
ProblemManagement
ProblemManagement ????????
Service Provider Contact Engagement Points
Bundle 2Mainframe
IBM Incident &Problem
Management
IBM Incident &Problem
Management
Krishna [email protected]
Krishna [email protected]
Bundle 3Desktop Management And WorksiteSupport
CompugenIncident & Problem
Management
CompugenIncident & Problem
Management
Paula [email protected]
Paula [email protected]
AcrodexIncident & Problem
Management
AcrodexIncident & Problem
Management
Vincent y [email protected]
Vincent y [email protected]
Non BundledOperations
Service Provisioning
Service Provisioning
Tuan NguyenSandy Stout
Tuan NguyenSandy Stout
Ser
vice
Del
iver
y M
anag
erM
inis
try
Con
tact
Ashish [email protected]
Ashish [email protected] AlbertaService Alberta
Process Management
CISOCISO
ITSMITSM
Christine GunarsonMary Boyle
Christine GunarsonMary Boyle
Alan WokoeckRaymond Viens
Alan WokoeckRaymond Viens
*Weekly Problem Management Review
*Daily Incident Management Touch point
*Weekly Incident Management Review
*Critical Incident investigation
*Critical Incident post mortems
*Incident and Problem Reporting (WOR, MOR)
*Problem Assignment
*Weekly Problem Management Review
*Daily Incident Management Touch point
*Weekly Incident Management Review
*Critical Incident investigation
*Critical Incident post mortems
*Incident and Problem Reporting (WOR, MOR)
*Problem AssignmentG
OA
Cor
pora
te In
cide
nt M
anag
emen
t
Leon
ard
Bla
key
/ Z
ofia
Sae
edi /
Joe
Tka
lcic
/ R
ebec
ca K
uehn
Some background on Problem Management Process A problem is the cause of one or more incidents. The cause is not always known at the time a problem record is created, and the problem management process is responsible for further investigation.
The key objectives of Problem Management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents, and to minimize the impact of incidents that cannot be prevented.
Problem Management includes diagnosing causes of incidents, determining the resolution, and ensuring that the resolution is implemented. Problem Management also maintains information about problems and the appropriate workarounds and resolutions.
Problems are different than incidents in that for an incident the goal is to resolve the specific issue as quickly as possible for the customer. Incident management is engaged for escalations and overall effectiveness of incident resolution. When multiple incidents seem to be stemming from a common cause, incident management may engage problem management to help document and solicit appropriate owners and resources to investigate and resolve.
Potential Problems are identified and tracked within the ITSM system problem module and reviewed. The problem investigation can be initiated by or assigned to the service providers (for critical and high priority issues they are aware and engaged in cause investigation and resolution plans). The problem ticket is documented through-out the investigation.
Problems are categorized in a similar way to incidents, but the goal is to understand causes, document workarounds and request changes to permanently resolve the problems. Workarounds are documented in a Known Error Database, which improves the efficiency and effectiveness of Incident Management.
In summary, when significant or repeated incidents occur the function is to work with the service providers incident management and problem management. Together perform reviews, root cause investigations, and tracking the solutions and control (does the problem stay resolved). Information on known errors is fed back into the system for knowledgebase article inclusion.
ReviewReview