Flipping Bits and Twisting Knobs

For four days, the dialog box that told me I had a bug in my program said the following:

Error -1

It took me four days to figure out what the problem was; days of dinking around with registry files and GUIDs, days of groping through poor, buggy documentation and horrible sample code, days of scouring the nets for even an inkling of what the problem was. I was calling into someone else’s API, and some tiny thing somewhere wasn’t lined-up quite right, and it sucked. All I could do was to change stuff, keep flipping bits and twisting random knobs until something changed, and use scientific method to narrow down the cause.

This is the worst kind of bug. I can deal with the race conditions, the obscure memory stompers, the uninitialized variables that cause random behaviour. But dealing with a flaky, fragile API that gives you no clue (other than a dumb dialog box that says “-1”) makes me mad and want to break things.

The best bug I ever found was in the Newton, a race condition that showed up only every few days of heavy operation. It turned out to be a one instruction window in the kernel, where if you hit it just right it would freeze a critical part of the whole operating system. It took two instructions to fix it. It took two weeks to find it, and when I closed that bug I was walking on air.

Finding the “Error -1” bug didn’t make me feel very good. While it was good to move on from it, the whoe experience could have been alleviated if someone had thought to return a meaningful failure value, something other than -1. [You also do not throw up an error dialog from a purely function-call API, but this is perilously close to ranting now.]

Even if you’re on an embedded system, where bytes are precious and you can’t store error codes, or can’t be bothered to invent them, you can return unique values. One version of Tiny Basic I saw (back in the late 70s — it was published in an early edition of Dr. Dobbs) returned code offsets for its error values. At least you knew that error 1434 meant “Missing semicolon” while error 1542 mean “Missing gosub target,” and even if the numbers weren’t in a nice, pretty sequence, you could suss out what was going wrong pretty quickly.

Really don’t write stuff that returns -1, and don’t write functions that fail with a boolean false or that throw a mere Exception.


Author: landon

My mom thinks I'm in high tech.