On the Unreliability of Programs

Gwern

On the Unreliability of Programs

Taxonomy of ways in which even the simplest equation in a computer program can fail in the real world, demonstrating how difficult approaching perfection or perfect epistemic reliability is.

by: Gwern 2024-10-20 finished certainty: log similar bibliography

I have never written an equation or line of code that I was 100% confident of, or which I thought had less than a 1-in-trillions chance of it being wrong in some important way. Software & real-world systems are too complex & fragile.

Every part of my understanding, the hardware, or the real-world context is less reliable than 1-in-trillions.

Let’s consider potential problems with our understanding of even the most trivial seeming arithmetic comparison checking that ‘x + x = 2x’…

Numbers that fool the Fermat primality test are called Carmichael numbers, and little is known about them other than that they are extremely rare. There are 255 Carmichael numbers below 100,000,000…In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a ‘correct’ algorithm.

Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering.

—Hal Abelson & Gerald Sussman (Structure And Interpretation of Computer Programs)

Consider a simple-seeming line of conditional code for the arithmetical tautology: x+x==2*x. How could this possibly ever go wrong? Well…

Where did you initialize x? Was it ever initialized to a non-null value?

(Or has it been working accidentally because it uses uninitialized memory which just happened to have a workable value?)
Is this comparison by reference, equality, hash, or some other way entirely?
Which integer type is this? Does that integer type overflow?
- In some languages, x might be a string being parsed as a number.
  
  JavaScript is infamous for this due to its type coercion and redefining operators; this will evaluate to true: x = “1”; 2✱x == 2 && x + x == “11”
- In highly dynamic or object-oriented languages, +, ==, and * could all have been redefined per x and mean… just about anything, and do anything as side-effects of methods like getters.
- homoglyph attack: ‘x’ might not be ‘х’… because the second one is actually a different letter (CYRILLIC SMALL LETTER HA)
  - “Trojan Source” attack: OK, you are sure ‘x’ is in fact ‘x’; but are you reading the right ‘x’? What you think is a line of source code might actually be a commented out line, and the comment is the source code, due to Unicode right-to-left/bidirectional characters
Does multiplying integers like this potentially trigger undefined behavior and arbitrary compiler ‘optimizations’?
- If it can never overflow because it’s a “big int” with arbitrary-precision arithmetic, how much RAM does this allocate? What happens if the result is larger than fits in RAM? (How would evaluation order like laziness affect this?)
  
  How much do you know about your big-integer library to begin with? (They do have bugs, like all software.)
If this is floating point (do you know for sure?), won’t this usually be false at larger/smaller numbers?
- What about floating point rounding or other exotic modes?
- Or multiplying special values like NaN or +Inf vs −Inf?
- If you know about all this and really did want that… how sure are you that the compiler isn’t incorrectly equationally-reasoning that they are equal and rewriting them behind your back?
- Did the compiler optimization away this check entirely because “obviously” it is true?
What is the operator precedence of this code?
By the way, are you sure it’s a conditional at all? Perhaps it was parsed as (x+x==2)*x?
Preprocessor/macro-expansion problems: In C, x might be a macro that expands to an expression with side effects, evaluated a different number of times on each side. Or 2✱x might lack parenthesizing: #define x1+1 makes 2✱x evaluate to 3.
What is the evaluation order of this code?
This is serial-threaded code, right? No parallelism anywhere? If there is…
- Signal handlers can interrupt your code and change stuff. So your code is de facto concurrent anyway. Oh, but you solved that? So there’s no more parallelism.
- Trick question: you thought there wasn’t, but there was anyway because all systems are inherently parallel now. So there are dangers around cache coherency & races, leading to many classes of attacks/errors like Spectre.
  
  And x here can change: the direct way of computing it would involve at least 5 values being stored & referenced somewhere. (The 3 written x in the equation, then the sum of two, and then the multiplied version.)
Are you running the right code at all? Maybe you’re running a cached binary, the build system didn’t rebuild the code, you are in the wrong working directory or the wrong VCS branch…
How likely is your computation to be corrupted or subverted by an attacker doing something like a buffer overflow attack or a row hammer attack, which winds up clobbering your x?
What happens if the computer halts or freezes or is DoSed or the power goes out halfway through the computation?
Why do you believe the hardware will always store & execute everything correctly?
- “Neo, you think that’s real hardware you’re running on?” And VMs can be buggy and implement stuff like floating point incorrectly, of course.
- What are the odds that the hardware will be hit by a cosmic ray during any of these operations? Even ECC RAM is increasingly unreliable.
- Or that your RAM has a permanent fault in it?
  
  (For several years, compiling the Gwern.net website would occasionally result in strange segfaults in apparently correct old stable regexp code; this turned out to be a bad RAM chip… where ordinary computer use simply didn’t stress RAM enough to crash noticeably often.)
- What are the odds that the CPU core in question is sometimes unable to add or multiply correctly? (If you’re a hyperscaler, they exist in your fleet of servers somewhere!)
- - What are the odds you will discover another Pentium FDIV bug?
- How do you know all instances of x were never corrupted anywhere during storage or transmission?

I can safely say that in my programming life, I have written many fewer than trillions of lines of code, and I have made many more of these errors than 0.

So I infer that for even the simplest-seeming code, I am unable to write code merely as reliable as a 1-in-trillions error rate.

[Return to blog index]

Error: JavaScript disabled.

Backlinks, similar links, and the bibliography require JS enabled to load.

Bibliography

[Bibliography of links/references used in page]

Similar Links

Bibliography