Just because every now and then, you call an IT issue a “problem” and resolve it, does not mean you’re doing problem management!
Problem Management Myths
Joe the IT Guy recently wrote about Problem Management: No Problemo on his blog (I highly recommend stopping by. Great stuff).
While writing What is Basic ITIL Problem Management, I started collecting a list of myths about Problem Management. So I figured, why not kick off the new year with a bang by busting a few?
So, here they are, in no particular order: six Problem Management Myths Busted (and one confirmed!).
1. Incidents become Problems when they’re not fixed quickly.
A common practice that feeds this myth is when Incidents are difficult to fix, and sit unresolved for days, weeks, or months. Customers complain, and eventually management makes it a top priority to “find the root cause and fix the problem”. A Root Cause Analysis effort is kicked off, which only confuses the goal. Are we trying to restore the service, or find the underlying problem? If the answer is ‘Yes’, you have officially confused Incident Management with Problem Management, and unwittingly added confusion and time to recovery.
Incidents never become problems. An important part of Incident Management is ensuring they are resolved within their target SLA timeframes. When thresholds are exceeded, Incidents can escalate – resources added, management engaged, customer communications sent. A Major Incident may be declared and an Incident Manager assigned. But it never becomes a Problem. The goal of Incident Management is always the same – quickly restore service to the customer.
2. Problems are worked by senior staff because junior staff couldn’t fix an Incident
This one may have it’s basis in the thought that senior staff are primarily engaged in more challenging issues, (‘problems’) that require more technical knowledge or experience to resolve.
The distinction is arbitrary and beside the point. Senior staff are not above Incident Management, and are often engaged when SLA deadlines are approaching, or business impact is high. Regardless, they are performing Incident Management.
Who is doing the work does not change the nature of it. Incident Management is Incident Management. Problem Management is Problem Management.
Solid Busted on this one!
3. Problem Management is just another way of saying “troubleshooting”
This one has it’s roots in the concept that first line support follow scripted diagnostic processes that require limited in-depth knowledge of the technologies involved. This scripted diagnostic is differentiated from “real” troubleshooting done by staff with deeper knowledge.
The thought goes something like – if first line can’t resolve it, it’s a “problem” that must be troubleshot by more experienced technical staff. Which means that real troubleshooting is the same as Problem Management, right?
Well, I hate to nitpick over words, but one of the advantages of Service Management is a shared vocabulary that increases clarity.
Call it what you like, when a service is down and staff are working (troubleshooting) to restore, that is Incident Management.
4. Incidents that are escalated are Problems.
Escalation is a normal part of the Incident Management process, and should be used as appropriate to ensure Incidents are resolved within target time frames. Similar to 1 above, Incidents that are escalated, either to deeper support, or up the management chain have not become a Problem.
The very reason for escalating is because Incidents must be resolved within a target time frame. Escalation is itself a recognition of the importance of that single goal – ensure service is restored. Use whatever resources necessary, but achieve job 1 – restore the service.
5. Any outage that has high business impact is a Problem
This is a tricky one, because, yeah, high impact outages are most definitely a ‘problem’ (lower case). Customers appreciate when IT is sensitive to their pain, and acknowledging the significance of an outage by using different words can help in relationship management.
But reality is, Incident Management as an IT capability is built to be the fastest, most efficient path to service recovery. Renaming, or using the wrong process because the stakes and visibility are high does not help speed resolution.
This is why Marines train, train and train. It’s why they tear down and reassemble their weapons in the dark, again and again – so that when it really counts, and the stakes are high, they will perform flawlessly and achieve the objective.
When it absolutely, positively, has to be fixed, let Incident Management do what it does best.
6. Problem Management should be done by the same people who do Incident Management.
This one is hotly debated by industry experts and practitioners alike, so some of you will disagree with me on this.
Hear me out.
The Service Desk is primarily responsible for Incident Management. Good Service Desks know their Prime Directive ~ rapid restoration of down services. They are compulsively fixated on one thing – restoration of service. The culture and personality that goes with this focus must be fostered, developed and supported in the Service Desk.
Problem Management is a very different culture; a different pace. Systematic and scientific. Dealing in minutia and digging deep into event logs and packet captures.
It’s like street cops vs Crime Scene Investigators (Think CSI:Las Vegas). Cops being street savvy and bold. Taking charge of volatile situations. CSI’s are focused on data and details. (“Let the evidence speak for itself.”)
The personalities are very different. Each its own profession. Using the same staff for both lessens the effectiveness of both.
This Myth is Busted. (Feel free to argue your case in comments below!)
7. Problem Management is required for good IT service delivery.
I love this one, because everyone knows that if you’re good, if you’re really good, you don’t have problems.
In my book, no one does it better than NASA. But a New York Times article from 2003 revealed telling findings from the Columbia Accident Investigation Board:
…NASA knew that the shuttle was vulnerable to debris strikes and knew that it was being hit by foam debris on nearly every flight, but left the issue unresolved. That factor, board members have noted, resembles the O-ring failure that destroyed the Challenger in 1986, when NASA knew it had a component prone to problems but did not recognize the potential for catastrophe.
Even the best engineered systems require ongoing Problem Management to identify and resolve underlying issues that cause Incidents.
Need help with How to Implement ITIL Problem Management?
What Problem Management myths have you heard? Share in comments below.
5 comments on „Six Problem Management Myths Busted (and One Confirmed!)”
I like that. Thank you. And, for what it’s worth, I agree with you on point 6 (in an ideal world, at least). Reality might dictate otherwise though.
I’ve seen Problem Management in a Fortune 10 company, and as small as 1 person shops. (the reality element!)
The whole thing between Incident and Problem Managment gets blury when the frequency of recurring high priority incidents is high e.g. it happens every hour after service was restored or a workaround applied.
So true. I think it’s especially true for organizations that are very reactive (fire fighting) oriented. Where there’s a mature, systematic Problem Management (and CSI) capability in place, the biggies are the first to be resolved, and Problem Management can start looking like it should. Unfortunately, the fire fighting leaves people with the wrong impression of the real role of Problem Management.
Comments are closed.