Just because every now and then, you call an IT issue a “problem” and resolve it, does not mean you’re doing problem management!
Problem Management Myths
While writing What is Basic IT Problem Management, I started collecting a list of myths about Problem Management. So I figured, why not kick off the new year with a bang by busting a few?
So, here they are, in no particular order: six Problem Management Myths Busted (and one confirmed!).
1. Incidents become Problems when they’re not fixed quickly.
A common practice that feeds this myth is when Incidents are difficult to fix, and sit unresolved for days, weeks, or months. Customers complain, and eventually management makes it a top priority to “find the root cause and fix the problem”. A Root Cause Analysis effort is kicked off, which only confuses the goal. Are we trying to restore the service, or find the underlying problem? If the answer is ‘Yes’, you have officially confused Incident Management with Problem Management, and unwittingly added confusion and time to recovery.
Incidents never become problems. An important part of Incident Management is ensuring they are resolved within their target SLA timeframes. When thresholds are exceeded, Incidents can escalate – resources added, management engaged, customer communications sent. A Major Incident may be declared and an Incident Manager assigned. But it never becomes a Problem. The goal of Incident Management is always the same – quickly restore service to the customer.
2. Problems are worked by senior staff because junior staff couldn’t fix an Incident
This one may have it’s basis in the thought that senior staff are primarily engaged in more challenging issues, (‘problems’) that require more technical knowledge or experience to resolve.
The distinction is arbitrary and beside the point. Senior staff are not above Incident Management, and are often engaged when SLA deadlines are approaching, or business impact is high. Regardless, they are performing Incident Management.
Solid Busted on this one!
3. Problem Management is just another way of saying “troubleshooting”
This one has it’s roots in the concept that first line support follow scripted diagnostic processes that require limited in-depth knowledge of the technologies involved. This scripted diagnostic is differentiated from “real” troubleshooting done by staff with deeper knowledge.
The thought goes something like – if first line can’t resolve it, it’s a “problem” that must be troubleshot by more experienced technical staff. Which means that real troubleshooting is the same as Problem Management, right?
Well, I hate to nitpick over words, but one of the advantages of Service Management is a shared vocabulary that increases clarity.
4. Incidents that are escalated are Problems.
Escalation is a normal part of the Incident Management process, and should be used as appropriate to ensure Incidents are resolved within target time frames. Similar to 1 above, Incidents that are escalated, either to deeper support, or up the management chain have not become a Problem.
The very reason for escalating is because Incidents must be resolved within a target time frame. Escalation is itself a recognition of the importance of that single goal – ensure service is restored. Use whatever resources necessary, but achieve job 1 – restore the service.
5. Any outage that has high business impact is a Problem
This is a tricky one, because, yeah, high impact outages are most definitely a ‘problem’ (lower case). Customers appreciate when IT is sensitive to their pain, and acknowledging the significance of an outage by using different words can help in relationship management.
But reality is, Incident Management as an IT capability is built to be the fastest, most efficient path to service recovery. Renaming, or using the wrong process because the stakes and visibility are high does not help speed resolution.
This is why Marines train, train and train. It’s why they tear down and reassemble their weapons in the dark, again and again – so that when it really counts, and the stakes are high, they will perform flawlessly and achieve the objective.
6. Problem Management should be done by the same people who do Incident Management.
This one is hotly debated by industry experts and practitioners alike, so some of you will disagree with me on this.
Hear me out.
The Service Desk is primarily responsible for Incident Management. Good Service Desks know their Prime Directive ~ rapid restoration of down services. They are compulsively fixated on one thing – restoration of service. The culture and personality that goes with this focus must be fostered, developed and supported in the Service Desk.
Problem Management is a very different culture; a different pace. Systematic and scientific. Dealing in minutia and digging deep into event logs and packet captures.
It’s like street cops vs Crime Scene Investigators (Think CSI:Las Vegas). Cops being street savvy and bold. Taking charge of volatile situations. CSI’s are focused on data and details. (“Let the evidence speak for itself.”)
This Myth is Busted. (Feel free to argue your case in comments below!)
7. Problem Management is required for good IT service delivery.
I love this one, because everyone knows that if you’re good, if you’re really good, you don’t have problems.
In my book, no one does it better than NASA. But a New York Times article from 2003 revealed telling findings from the Columbia Accident Investigation Board:
…NASA knew that the shuttle was vulnerable to debris strikes and knew that it was being hit by foam debris on nearly every flight, but left the issue unresolved. That factor, board members have noted, resembles the O-ring failure that destroyed the Challenger in 1986, when NASA knew it had a component prone to problems but did not recognize the potential for catastrophe.
Need help with How to Implement IT Problem Management?
What Problem Management myths have you heard? Share in comments below.