When Devo unleashed their infamous “When a problem comes along, you must whip it”, they did more than coin an icon of 80’s pop music. Let’s move ahead with Basic ITIL Problem Management.
The Problem with ‘Problem’
In ITIL, a Problem is the underlying cause of one or more Incident.
Where Incident Management is focused on rapid recovery of service (even if the underlying cause is not identified), Problem Management is about identifying and resolving these underlying causes to eliminate future Incidents.
Different by related.
They’re like Cersei and Jaime Lannister. (Game of Thrones) They look a lot alike….. but, ewww, for gosh sakes, let’s keep them separate!
The Goal of Problem Management
Problem Management is the process to identify, prioritize, and systematically resolve these underlying issues. It provides the end-to-end management of problems from identification to elimination.
A simple example – a flat tire. Everyone wants their tire fixed quickly so they can get back on the road. The second time you have a flat; same thing, only slightly more annoying. This is Incident Management, and it can faithfully fix flat after flat, each time getting you back on the road with fast, friendly service.
You may ask what caused it.
Bummer. Wonder what caused the other one?
Many organizations stop there. The question goes unanswered. And the problem happens again.
Problem Management is the process used to answer the question, identify the underlying cause, and take corrective action.
How Problem Management Works
Problem Management has one goal: Identify and remove underlying causes of recurring Incidents. Where it isn’t possible to prevent Incidents, Problem Management seeks to minimize the business impact of those that do occur.
If pressure is being applied to quickly “find the problem and get the service back up”, you’re not doing Problem Management. That’s Incident Management, and it has a different clear goal – to restore service fast!
Problem Management is a completely different process.
There are two main types of Problem Management:
Not to state the obvious, but reactive Problem Management is triggered in response to an Incident. Many organizations hold Post Incident Reviews for Major Incidents, and when it’s believed there’s an underlying problem, a reactive Problem Management effort is started.
Proactive Problem Management uses trending and historical information to identify potential Problem cases. This can be anything from formal Continual Service Improvement, to moderate data analysis (trending), or good old gut feeling.
Regardless of the source, Problem Management cases should be prioritized based on value to the business. Pain Value or Business Impact analysis identifies the Problem(s) who’s elimination would have the highest business value.
The Basic ITIL Problem Management Process
- Identify a potential Problem
- Raise a Problem Management case
- Log the problem
- Categorize and prioritize
- Systematic investigation (Root Cause Analysis)
- Identify change(s) needed to resolve and work through Change Management
- Verify problem has been resolved
- Close out problem
The specific problem analysis method used is less important than that it is thorough and systematic. Some of the more common techniques include:
- Kepner Tregoe
- Fault Isolation
- Ishikawa diagrams
Many of these techniques can be used together as the situation dictates. Which you use depends on the specifics of the problem, the environment, complexity, organizational culture, skills and knowledge. Use what works, but keep in mind that the goal is to identify the underlying root cause. Use data-driven analysis to avoid ‘oh, this is it….. no…. this is it… no….’
Back to flat tires for a second.
A good place to start would be to lay out the facts we know in a chronological order.
We discover that the first flat was on Monday, June 17th. The second flat was Monday, June 24th. Interesting. Same day of the week.
Taking a look at what else we know, the second flat was “probably a nail”. Don’t remember what caused the first one. Other than it’s interesting that they both happened on a Monday, there doesn’t seem to be anything unusual. Until the following Monday, when another flat happens. This time, we ask the tire store to see what caused the flat, and sure enough, it’s a bright, new nail.
So, armed with this, we do some brainstorming. We ask if there’s anything special about Monday. Only thing that’s related to driving is that Monday is your day to drive your daughter to school.
On the way home that evening, you retrace your Monday route, and along the way, there’s a construction site. You stop and look around, and notice several nails on the pavement which are identical to the one that caused the flat.
Bam! Root cause.
You Change your route to work, and don’t have any more Monday flats.
Clearly a simplistic example, but you get the point.
So, Did Devo Have it Right?
Problem Management is one of the key ITIL processes that improve customer satisfaction with IT Services. Customers can understand things happening once, but when the same outages happen again and again, they start thinking IT isn’t doing a good job.
It certainly doesn’t add value (see The Real Value of ITIL Incident Management)
Did Devo have it right? When Problems come along, should we whip them good?
Naaa – it’s Incidents which must be whipped when they come along. Problems should be systematically managed through the Problem Management process.
Keep the two separate and distinct, as they each have very different goals.
Who’d have known that Devo had it wrong all these years?
Be sure to check out Six Problem Management Myths Busted (and One Confirmed!).