IT upgrades slow trains in San Francisco

31.03.2006
Unsuccessful software upgrades made to San Francisco's Bay Area Rapid Transit (BART) train system last Sunday stranded thousands of commuters for up to two hours Monday and Tuesday when the trains had to be stopped for safety reasons while the IT system was repaired.

Linton Johnson, a BART spokesman, said the delays and shutdowns occurred after IT staffers did maintenance upgrades to the software that coordinates and runs the trains, tracks, operating signals and track switches. The attempted upgrades caused the system to crash twice during the following 48 hours, he said.

On Monday and Tuesday, the crashes caused transit delays when the trains were halted so they didn't run into each other, he said. After those software problems, BART IT workers on Wednesday decided to install backup software for redundancy in case of continuing problems, Johnson said. 'We were rushing to do the right thing,' he said. 'However, in the process of installing that backup system, we interfered with a [network switch] that crashed our system.'

That switch allows computers in BART's central operations center to communicate with other parts of the mass transit system to keep it running properly.

When the backup system was brought online, the network switch was overloaded, 'which wasn't anticipated by our computer technicians,' he said.

Normally, such maintenance isn't done during the week to avoid disrupting commuter service, Johnson said. But the backup system installation was done on Wednesday to try to stop the problems experienced earlier in the week. 'We, for some reason, decided to try and get this backup system installed' during normal commuting hours, he said. 'Technically, it shouldn't have affected anything, but in reality, it did.'

Wednesday's system failure stopped the trains at 5:27 p.m. PDT for about 70 minutes. The problems caused delays of up to two hours before all the trains were operating on schedule, he said.

'We're taking measures to make sure that this never happens again,' Johnson said. 'This is essentially a mistake on BART's part.'

About 15 of BART's approximately 100 IT workers usually perform software upgrades, he said. Software testing is done in advance on a separate 'virtual environment' so that it won't interfere with everyday operations, he said. The software upgrades done last Sunday had passed extensive testing and are designed to 'self-correct. [It] has proven to self-correct quickly when there is a problem.'

The work is part of a larger multiyear software upgrade being done in phases to avert commuter train slowdowns, he said. 'In the midst of this installation procedure, which has been going on for 14 months...things were going great until Monday,' Johnson said. The software upgrades are expected to take another five months to complete and BART officials are still trying to figure out what caused the problems this week.

The Integrated Control System is maintained by BART IT workers, but originally came from Logica, an England-based vendor, Johnson said.

BART began passenger service in the Bay Area in 1972 and now has 670 rail cars in its fleet, which cover 104 route miles through 43 stations. The system carries about 300,000 riders daily.