Floating Point Disasters

In class, we have been talking about various topics related to floating point numbers so I was interested in their real world application, specifically what happens when they are not applied correctly (because disasters are more interesting to read about). I did some searching and found several incidents where a careless application of floating point numbers resulted in grave consequences, either the loss of large sums of money or, more importantly, the loss of life. I will give a short summary of the events that I found most interesting and their causes and you may follow the links I provided if you want more information.

Patriot Missile
During the Persian Gulf War in 1991, Iraqi forces launch a scud missile at an American Army barracks stationed in Saudi Arabia. Upon detection, American forces launched a missile of their own to intercept the scud. Unfortunately, the American missile misses completely and the scud hits its target resulting in the death of 28 soldiers. Later investigation determines that the cause of the error was a rounding error in the software used to determine the incoming missile’s trajectory. Specifically, the software kept track of the time since boot up and used this value to determine the trajectory of the incoming scud. The time was formatted in tenths of seconds and it was multiplied by 1/10 as part of the calculation. You may wonder how this is relevant since we know 1/10 = 0.1 so we just move the decimal point one place to the left when we multiply; none of the digits of the number are changing so it does not seem to introduce any uncertainty at all. However, the software was working in binary and, while we know 1/10 as 0.1 in base 10, it is nonterminating in base 2. Look here half way down the page if you want to know why http://en.wikipedia.org/wiki/Binary_numeral_system. The binary representation of 1/10 was truncated at 24 bits and this ultimately resulted in an error of 0.34 seconds in where the software thought the scud was and where it actually was. While a third of a second is insignificant in daily life, it becomes much more important for an object traveling at about 1600 meters per second. This computational error resulted in the failed interception and the loss of 28 lives. You can read about exactly how the truncation resulted in a 0.34 second error here as well as further information about this incident http://www.ima.umn.edu/~arnold/disasters/patriot.html

Ariane 5 Flight 501
Ever get an arithmetic overflow exception while running code? Next time, it could cost you $370 million dollars. A casting error by the Ariane 5 software used for space launches caused Flight 501 to veer off course and forced mission control to self detonate it. The Ariane 5 software reused specifications from its predecessor, Ariane 4, even though the launch conditions were different. A short time after lift off, the software performed a cast from a 64 bit floating point number to a 16 bit integer while doing flight path evaluation. The floating point number was too large to be represented as an integer so an arithmetic overflow exception was thrown. Now any good programmer knows that exception handling should be implemented when doing such a cast so you may wonder why such a basic principle was overlooked especially for such an important application. For an unknown reason, the exceptional handling for this case was disabled for what the programmers later described as efficiency reasons. This exception cascaded throughout the system and caused a false adjustment to be made which caused a loss of control and ended in the detonation of Flight 501 about 30 seconds after launch. On the bright side, this disaster brought to light the significance of risk in computing systems and resulted in increased funding for research on the reliability of life critical systems. You can read a detailed report about Flight 501 here http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html and you can see a video of the actual launch and explosion here http://www.youtube.com/watch?v=IONcgYzVFlg.

For more floating point mishaps:

I’m sure there are many of you like myself who check hand done calculations using a calculator/computer because we don’t trust ourselves and a computer always gives the right answers. Well, not always as one mathematics professor found out… (coincidentally, he was working with reciprocals of prime numbers which is what the post below mine is talking about) http://www.willamette.edu/~mjaneba/pentprob.html

Posted in Topics: Uncategorized

Jump down to leave a comment.

One response to “Floating Point Disasters”

  1. » Arbitrary Precision Mathematics » Cornell CS 322 - Intro to Scientific Computing Says:

    […] of failures that can result from bad numerical programming. Disasters written about extensively in earlier posts such as failing rockets may have been averted by the usage of higher precision numbers. Big […]

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.