Two kinds of north
Some bugs are invisible from inside the code. Every line is correct, every test passes, the program does exactly what it says. It is just wrong about the world, and no one on the inside can tell.
I shipped one of these for weeks without knowing. On the scope, a holding pattern is a racetrack, and it should sit square on the leg it is built around. Mine, at Mexico City (MMMX), were drawn about five degrees crooked. The code that placed them was right. I read it line by line and found nothing to fix, because there was nothing wrong with it.
What it was wrong about was north. The same holds at New York were crooked too, but by more, closer to thirteen degrees than five. A bug in the geometry would be off by the same amount everywhere. This one changed size with the airport, and only one thing does that: the magnetic variation, the local gap between true north, which the map is drawn around, and magnetic north, which the compass and every controller actually use. Five degrees of it at Mexico City, thirteen at New York. Somewhere a heading had crossed from one frame into the other without being converted.
There was no test I thought to write for it. A test checks that the code does what the code says, and the code said exactly what it meant to: take this angle, draw a racetrack here. Every line was internally consistent. The mistake was not in the logic, it was in a number that meant something different from what the screen assumed, and a test only catches the errors you already know to look for. From inside the program, five degrees of wrong is identical to five degrees of right.
My own eyes were no better than a test. I knew the runways were magnetic, I knew an airplane pointed up the screen read close to zero, and that matched, so I moved on. I was checking that the feature worked, that the lines drew and the numbers updated, and it all did. None of those checks could have shown me the angle was off.
It was found by the one person positioned to see it. My dad worked Mexico City approach for 41 years. Early on, the heading lines for a direct-to clearance had the same flaw, drawn in true instead of magnetic, and they stayed wrong for two weeks while I did not notice. He noticed in a minute. He gave an airplane a turn onto runway 05R at Mexico City, read the heading the system handed him, and knew it was wrong before he checked it against anything, the same way he caught the Mach numbers. “Tus rumbos están mal. ¿Qué norte estás usando?” Your headings are wrong. Which north are you using?
He was not checking. He was reading. To him the number on the screen was not a value to confirm, it was a heading an airplane would fly, and the wrong one looked as wrong as a misspelled word. That is a validation I cannot run on my own code, and no test framework can run it either, because it lives in knowing the airspace, not in knowing the program.
This is the thing I did not expect about building a simulator. Whether it works is a question about the code, and I can answer it myself. Whether it is right is a question about the world, and I can sometimes reason my way there too, slowly and by luck, the way I backed into the holds by noticing New York. The hard part was never answering the question. It was knowing to ask it.
None of this is an argument against tests. The fix came with one, and it fails the day I make the mistake again. But a test only guards a mistake you already know is possible, and I knew this one was only because a retired controller testing on an iPad saw it on the screen. He is not a substitute for the test. He is where the test comes from.