Thursday, April 21, 2011

Extreme Firefighting

Someone I know once interviewed for a job at the Olympics.  We needed people for specific roles.  He was a candidate.  The interviewer asked a scenario question.  She spoke really fast.

Within the past 3 minutes, you hear:
a broadcaster is screaming that the video feeds aren't working
- 2 of the help desk computers just died for some reason
- a volunteer at the competition field of play is complaining that the radios aren't able to talk to each other
- someone from the volunteer coordination office is reporting a jammed printer
- a technology sponsor company's marketing executive just arrived on venue but forgot his accreditation pass
- the power for the catering kitchens just went out
You know that competition is about to start in 30 minutes.  How do you fix these issues?

Well, the question was something like that, I can't recall the exact scenarios.  My friend was stunned.  His jaw dropped.  He stammered out something or other.  He didn't get the job.  Today, we discuss what to do in critical situations when you feel overwhelmed.  There's a term for when issues keep getting thrown at you: firefighting.  Every manager will always complain about how much time they have to spend each day "fighting fires" before they can even get to the work they really need to do.  But overwhelming critical situations are at a whole other level.  They consist of two types of crises: earthquakes and forest fires.  Your normal insurance will not cover the damage from these babies.

Earthquakes are really intense and can cause extensive damage.  You can't prevent them, nor can you easily escape unscathed from them, but you can predict their probability and make proper preparations.  If you're good at what you do, the damage from earthquakes is minimal because you were ready on two fronts: first, how to survive the earthquake, and second, how to manage the fallout.  Fortunately, earthquakes are over fairly quickly.  The above interview question is an example of an earthquake scenario.  Earthquakes usually happen in the operation stage of a new initiative, though they can also be known to happen in the planning or implementation stages.  The unfortunate thing about earthquakes is that once the problems are solved, there can still be aftershocks.  After all, in the heat of the moment, it's not always possible to implement a solid permanent solution.

Forest Fires
Forest fires may arise from earthquakes (think an earthquake situation where the problems don't get solved and just keep growing), but forest fires are much more likely to happen during the planning or implementation of any work.  Forest fires more often than not start out small, but hinge on key issues that become gargantuan and threaten to destroy everything you're trying to do.  Obviously, although forest fires can also cause extensive damage, they're nice in the respect that you can influence or control forest fires.  You can influence the path they take and douse water all throughout.  You have time to manage a forest fire, but if you don't get it under control in a timely manner, everything around you will be consumed.  Don't mistake the long-term nature of forest fires for safety.  It started with a poor network design.  It grew with realizing the servers required fibre, not copper cabling.  It exploded when you saw that there was half as much infrastructure allocated as what would really be required.  It became a full-on forest fire when all your best people were taken away to deal with an emergency black swan earthquake that's now considered #1, but your new forest fire is still right up there at #2.  Forest fires unfortunately take a long time to settle down, maybe months, depending on the situation.

The medical profession has done an amazing service to the world by devising a system for how to deal with a large sudden inflow of work with limited resources.  It's called triage, first used by French doctors at the war front during World War 1.  Triage is implemented in hospitals, IT operations, customer service operations, project management operations, etc.  Triage is particularly well-suited for IT operations.  By using ticketing systems, we can assign tickets to individual issues to both track status and severity.

Standard fare in IT may go like this, but the concepts are easily applicable to other industries and professions.  A severity 1 ticket is considered mission-critical and requires all immediate attention.  Severity 1 tickets are showstoppers.  Hearts are literally not beating.  Severity 2 tickets aren't showstoppers, but they have the risk of easily becoming showstoppers.  Hearts are still beating, but the situation is critical; life is hanging on by a thread.  Severity 3 tickets are not showstoppers, but there is no resolution.  The problems just aren't big enough to receive the highest priority.  Severity 4 are lower priority, especially because there are temporary workaround solutions available.  Severity 5 are simply requests to get new work done, but nobody will complain if the request doesn't get done immediately.  Of course, a Sev 5 can become a Sev 1, if the deadline isn't met and the request was really important.  Everyone's standards for what is considered a Sev 1 vs a Sev 2 vs a Sev 3 will vary, and the key is to have those standards and criteria clearly mapped out before you need to use them.  It's important for everyone to be on the same page so that there's no argument, only correct action, when the crisis hits.

Through triage, you have a clear view of what your issues are and what priority each issue should take.  Then you figure out who needs to be involved to resolve each issue and what the deadline should be for resolution.  You end up with a list that guides you on what your actions should be.  If it's not on the list, it doesn't deserve your attention.  If it deserves your attention and is new, add it to the list.  But be sure to ask yourself and the requesting parties seriously whether or not it deserves to be on the list.  You're in crisis mode and your brain can't handle everything.  You're being overwhelmed, remember.  So you want to keep the list as small as possible, though that may end up meaning 50 issues or even 100 issues, depending on the complexity, size, and visibility of your work.

The important thing to note is that this issue log allows you to focus, analyze, delegate, and resolve things in a rational and steady manner.  You'll see progress, though the amount of progress will vary on multiple factors, including your own management skills, the capabilities of your team, the amount of bureaucracy you have to face, the politics that will be inevitable, and the quality of your relationships with 3rd parties.  But without having this list, you'll be flying blind and you won't know if you're focusing on the right things.  Maybe you'll have saved a tree from the forest fire, but in the process, you may have sacrificed an entire acre.  Don't do that.  Create a list, review and update it regularly, and execute, execute, execute.  In earthquakes, you'll have to create the list rapidly in your head, with advice from only your key experts.  Don't make the mistake of having too many chefs in the kitchen.  Fewer people reduces uncertainty and increases efficiency.  Don't panic and take a full half hour if you have the time.  In forest fires, you can sit down for maybe an entire week to calmly map out all the issues.

Here is the type of data I'm tracking right now for a major forest fire that's raging at work.  This is the type of data you need to make good decisions. Then delegate to your team, lead by example, and get it done.  Within due time, you'll have your crisis under control, although you may not be able to escape unscathed.

Issue #
Date Raised
Issue Name
Issue Description
Special Approval from Specific Authorities Required to Take Action?
Relevant Document
Committed Resolution Date
Resolution Date

Have fun with all your disasters.  :)

No comments:

Post a Comment