One of the things that seems to limit deployment of new technology is our ability to debug it during bringup. I’ve seen a number of projects fail because they didn’t plan for the inevitable “we’re fixing bugs” stage of development. Planning like: You’re going to write a new operating system on top of a new hardware platform, but management is too cheap to invest in logic analyzers for the developers; six months later things are slipping week-by-week as new bugs crop up, and the fix rate sucks (whereupon you ship something flaky, and the market buries you). Another classic losing strategy is to make a system that is impervious to debugging (I once heard “What’s wrong with printf? Your download time is only 45 minutes” from a chip designer — to be fair, his turnaround time was weeks or months, but he had simulation resources that the software guys didn’t).
Why is this so bloddy hard? We keep making mistakes, and this isn’t going to stop happening. Program proving and rediculously strong typing and so forth aside, we’ll continue to blow it in one way or another, and the more tools we bring to bear, the more clever our bugs are going to be. Sometimes the mistakes are great mistakes — ask any old fart about the time it took two weeks to find a problem that was a one-line fix. These stories are true.
Things I have used in debugging: Debuggers, when available. No debugger, when the available debuggers have been worse than the disease (MPW, augh). The venerable printf, or its equivalent. Logs of all kinds. Whole machine state that you get from a machine through a serial port (tap foot for several hours). Nano logs written to memory (to catch race conditions). Talking the problem out with a cow-orker. Flashing LEDs. Speaker beeps. Floppy disk track seeks. An AM radio held next to the machine. A photocopy machine (to get a permanent record of a register dump from the face of a hand-held machine). Armies of borrowed machines (hoping to hit a rare timing condition by enlisting the masses). Staring at code for hours. Cans of coolant spray. Heat guns. Logic analyzers. Shipping anyway (sigh).
One of the best ways to learn debugging is to sit right next to someone who’s doing it and to get a running commentary on what’s going on. I learned a lot from the Mac guys; they’d grab an armful of ROM listings and start merrily tracing away into the operating system. I never truly appreciated Macsbug until I saw someone use “magic return,” then the lights went on. The windows kernel debuggers are similar in nature — these are user interfaces for what’s going on at a very low level. Source level debugging usually lie to you (well, that’s why they exist), and when you need to understand exactly what’s going on, you need something geeky. I don’t know what operating systems we’ll be running fifty years from now, but there will always be a kernel debugger with a bunch of two or three character commands, and a geek culture around them.
I’ve been doing some reading on the Itanium architecture. It’s a superscalar VLIW-in-your-face machine, with lots of the memory system and caching details exposed to the application-level programmer (e.g., you can issue speculative loads, and then use them or ignore them later). The runtime system looks insanely complex; this is not your daddy’s vanilla flat-address-space-with-lots-of-registers RISC processor. My guess is that if Itanium falls flat on its face in the market, one of the reasons is going to be that only a few thousand people truly understand it and can diagnose and fix low-level problems. This doesn’t exactly help new hardware development.
Invest in debuggers. You’re going to need them anyway.