a simple essay

Oct04

Problems with the Therac-25

Finger pointing has been a time honored tradition since the time of the caveman. No one is happy in a bad situation until blame can be place on someone or something. In the case of medical mistakes, blame is often the only way to find closure when grave errors occur. In the case of the Therac-25, in which 3 deaths and 2 severe injuries occurred, blame could be easily placed on the machine. However, what part of the machine was to blame? Was the hardware the culprit, or was the software at fault? In the article “An Investigation of the Therac-25 Accidents” by Leveson and Turner, the authors placed most of the blame on the software. I suppose I would agree with this as well, given merely the choice between software and hardware.

When the problem first became evident, the Atomic Energy of Canada Limited (AECL) thought that it was a hardware problem, a microchip that delivered the position of a turntable in the system. After the first incident in Hamilton, the AECL determined that the way the turntable represented its position could lead the microchip to report ambiguous results. Even if a later problem showed that this incident could have been a software error as well, one has to wonder if there were not many other software/hardware problems. Due to the fact that the THERAC-25 relied more heavily on software for safety measures, there were not many hardware fail safes. Since machinery can have a tendency to break without warning, fail safes should have been put in place to stop the machine should there be any hardware malfunction. Also, the system itself was tested as a whole, and not in smaller modules. They should have taken the time to test the smaller components to see how they were holding up, and not just making sure the machine as a whole worked. A chain is only as strong as its weakest link.

However, as further investigation showed, there were many more software problems in the THERAC-25 than hardware issues. Although the software was built with a preemptive task scheduler, the implications of this were not fully thought out as it was later seen that the system was open to race conditions on shared data. The problem here came up in the data entry subroutine, where it became possible with fast entry of data to not set the correct data after input. Although the subroutine looks like it should work all the time, it should have been conceivable to the engineer that it could be possible to enter data too quickly causing the correct, wanted data to be set incorrectly. This is a big oversight in the design of the system considering that depending on the type of data, it could be easy to combine the different settings and dosages to create an extremely high dose. This subroutine should have been written in such a way as to make absolutely sure that the correct data had been input. This simple flaw could have been found if the software had been more stringently tested. But, due to the software team’s overconfidence in the code that was recycled from the previous machines, it was deemed unnecessary to do a thorough test. Sadly, this is not the only place were there was bad foresight into design. In my opinion, the whole system should have been built with a set of system check routines where by on certain errors, and all unknown errors, the machine would shut down instead of just pausing the system and allowing the attendant to continue. A call to system diagnostic could be made and the problem fixed before anyone would have the opportunity to get hurt.

Besides the machine itself (both hardware and software problems), I feel as though fingers should be pointed in other directions. Specifically, they should be pointed at the people behind the machine. The most obvious people the blame are the programmers and the people who built the machines. After all, they are directly responsible for all the malfunctions. Although, if you think about why these people did not work harder on the Therac-25, the answer becomes obvious. The AECL was trying to cut corners and save money. They could have paid more programmers to double or even triple check the code. They could have spent more time testing each machine before sending it out. The simple fact is that they did not, and it all comes down to saving money.

On the other hand, the AECL does not have the final say in whether or not the Therac-25 was a dependable machine. The FDA had to approve it before it could be used on people. Perhaps if they had had more strict guidelines about testing, none of this would have ever happened. Thankfully, this incident has indeed forced the FDA into making more strict guidelines. After all, nothing ever gets fixed until someone realizes that it is broken. Also, I feel a little bit of blame should be placed on the operators of these machines. They are the ones who should have noticed the many errors occurring, and insisted that someone check out the machines.

When it all comes down to it, all the people behind the Therac-25 should share the blame for its many malfunctions. In addition, a machine that can be that dangerous should always have an automatic shutdown if any error occurs. There should be a failsafe in place to make sure that there simply can not be an “accidental” overdose.

History and Philosophy

Comments

Leave a comment of your own

If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.





Powered by Movable Type 3.2 Some rights reserved © 2000-2004

Site feeds. Hofully valid XHTML and CSS. No kitties were harmed in the making of this website.