Upgrade disasters made easy

30.01.2007

I'm the technical administrator at a large medical group in Canada. Among other things, I'm responsible for the LAN, the WAN, all the desktops, laptops, peripherals, and a medical-records application that's at the core of our group's operations. Over the last couple of years, we've been struggling to make that app perform more reliably. At the same time, our infrastructure has been growing fast, and sluggish performance from our overloaded servers had become a problem.

We developed quite a wish list:

-- A two-generation upgrade of the medical records application -- an app that's critical to the health and well-being of our patients;

-- A two-generation upgrade of our revenue-critical Practice Management application;

-- Moving our datacenter from an overcrowded, in-house facility to a third-party hosted center;

-- Installing bigger, faster servers for in-house use;

-- Upgrading our connectivity software from RDC/Terminal Server to Citrix ICA;

-- Deploying server virtualization and SAN (storage area network) technology;

-- Upgrading Windows Server, SQL Server, and triCerat (software that supports printing across the WAN);

-- Buying new scanning software to interface with the upgraded medical records app.

OK, there were issues. Our CIO had no experience with Citrix ICA ' or server virtualization ' or SANs. But we figured if we moved through this process one step at a time, everything would be fine.

Then, in the infinite wisdom of the powers on high, our CIO got approval to make all these changes simultaneously, during a one-week roll-out! I warned the CIO that the proverbial snowball in hell would have a better chance of success than this ill-advised rush to disaster. But he assured me that his staff would be able to handle any problems.

The roll-out started on Friday evening. By Saturday morning we were crippled. The only functional apps were an old Exchange server and the staff time-clock software running on the old server. The CIO and his staff of help-desk technicians had reserved Sunday for testing. But nothing was working, and the level of chaos was so intense nobody knew what to test first.

Any first-term computer science student would have known that a more incremental, staged transition would have had a better chance of success. For much of the week the transition team threw money at Microsoft, Citrix, our medical-records application vendor, an IT consulting group, and the new datacenter hosting company, trying to isolate, identify, and repair the vast number of issues that had arisen. The CIO spent the week hiding from a small army of inside staffers and outside consultants, all of whom were waiting in line to call him bad names.

By 5 p.m. Thursday things were beginning to work, and by mid-day Friday I was reasonably confident that we were able to care for our patients. I had been worried that a patient might die because of the deranged state of our systems. But we got lucky, and no one suffered physical harm. On the other hand, my best guess is that during the course of this week we spent roughly twice the half-million dollars that had been allocated for the upgrade.

To my surprise, the CIO hasn't been fired. Maybe that's because HR can't find his records.