By Hank Seader
Principal, Blackdog CFS
Raised in the trades, steeped in the Air Force, tested in industry, my perspective is molded by the belief that preparing for the unlikely beats suffering from the foreseeable. This view is especially appropriate in critical environments like data center infrastructure – filled with lots of moving parts, often limited control and typically unlimited expectations.
Occasionally, short cuts and assumptions created some opportunities to learn from mistakes. Fortunately, experience in one situation is effective in another and one doesn’t have to make every mistake to learn from them. True in the trades, the Air Force and in industry, investing in good tools, keeping a stock of spare parts and materials and having a plan for when things don’t go your way can overcome breakage and unexpected situations. In the field of critical environments, investments in training, redundant equipment and procedures are necessary to overcome infrastructure outages and loss of continuous computing.
Most influential to my perspective is the years I spent as an Air Force pilot. No “Top Gun” glamour for me: my days were made of big airplanes, big crews and long days. Working “half days” meant 7 to 7, getting to choose either the day half or the night half. This was a lot like my data center operations days to come later. Importantly, other parallels between aircraft and data centers were legion and while I began my career thinking I’d fly forever, I was actually preparing for my last Air Force assignment, which was operating a very large data center. Both aircraft and data centers have engines, electrical systems, fuel systems, air conditioning, hydraulics, fire detection/suppression, communications, controls and check lists. And behind every checklist are systems training, operating procedures, crew discipline and practice and practice and practice.
Unfortunately, in many critical environments, developing procedures and check lists are often the result of upsets or failures that your boss doesn’t want you to repeat. But you don’t have to wait for a situation to go wrong before you develop an effective procedure. I have often heard that most flight manuals are written with the blood of other crews’ mistakes. Take time to know your critical environments’ infrastructure and go find somewhere else that has similar equipment. Data centers, hospitals, 911 centers, airports and even railroads may all share some kinds of systems and gear. They can also share procedures and check lists.
Creating and practicing a checklist before you have to use it allows you to execute your action in a calm, confident, measured response. Knowing that there is a difference between an inconvenient system upset and an actual emergency can result in a response that is both effective and efficient. When Successful, it may seem to many as if your actions were heroic. Effective, efficient, heroic – all good words to have on your fitness report (or your corporate annual evaluation.)
As a pilot and crew commander, I practiced procedures and check lists that included engine-fire-on-takeoff, loss-of-flight-instruments and bail-out. Even though I only had the occasion to “experience” the first two, I don’t regret having prepared for all three. And as it turned out, my years operating aircraft became preparation for an even longer career operating data centers.