Naming

Digital Research Inc (yes, the CP/M folks) had a debugger once. The author of the debugger was a linguist, and thus all of the identifier names were in Russian. There were new control structures invented out of macros (things like unless and reprise and whenever, all poorly armored against side effects). There were clever constructs, such as memory moves that deliberately over-wrote each other in order to accomplish fills (the author was proud of that bit, I believe, and crowed about it in the comments). There was a bunch of fluff code (some C programmers feel compelled to reinvent the standard C library with every new project), and a fair amount of stuff that was just random and bad. But worst of all were the names. Russian wasn’t the author’s native language; hallway rumors had it that he’d written other programs in assorted scandanavian and european languages. Anyone wading into the code was going to have a tough time. And sure enough, that debugger was buggy as hell.

I guess it was cute. I’m sure it was a disaster. DRI isn’t around any more, and part of the reason (I believe) was the culture that allowed this kind of bullshit to happen.

Names are important. I and J and K are probably okay for loop control variables (though I doubt it), but as globals they are right out. The declarations of things like Klepesta and Barbados better have a damned good comments. I once called something DefineGuidRightFuckingNowDammit and it got the message across. It also had a comment next to it explaining the name (and the time of day, which was like 2AM just before a release).

Tradition has it that if you know something’s true name, you have power over it. I’m guessing that quite a few programmers don’t know the true names of the things they are manipulating, and are correspondingly powerless when it comes to figuring out what the code does.

Names get even more critical in object-oriented design. Experienced designers know that great names, like Entity or Process are so overloaded as to be meaningless. One of the best tools in a designer’s bag of tricks is a good thesaurus (I’m partial to Roget’s). Often you need a small pantheon of somewhat-related names, and this is why I believe that good designers are also compulsive readers. Command of language is hard to underestimate.

And if I see another set of variables named ii and kk I am going to scream.

Land Grabs

Keywords are bad, yes? Good languages are defined with a minimum of keywords. Bad languages (e.g., early relics like COBOL) grew up before the age of such minimalization, and are shot through with keywords.

Food for thought: Every time you use #define you’re defining a new keyword. Not so cute any more.

For years (and this may still persist) the Macintosh had a "#define nil 0" in the core headers, and all of the sample code used it; in other words, Apple had added the keyword nil to all Macintosh programs written in C. The values of true, TRUE, false and FALSE are so varied and re-re-re-defined that dependencies on these are one of the first things to clean up when merging one body of code into another. The number of programs that mis-define NULL are astounding. The variations on “a typedef for a 32 bit integer” seem nearly infinite, and are sometimes frightenly wrong.

Rules of Thumb

If you think it’s cute, it probably won’t be in the morning.

There are three kinds of cleverness: Smart, Coyote and Stupid. Smart-clever is great fun and will earn you points in the afterlife. Coyote-clever will grate on people’s nerves, but they’ll respect you for it. Stupid-clever will get you dumped off of a tall building, and people will pee on your grave.

Rewrite dodgy code as early as possible. Small bad decisions grow into large bad decisions, and they are more easily corrected before they metastasize or solidify.

The right forum for obfuscation is the official Obfuscated C contest.

Style

You’re buying a house, and there are two prospects that different builders have asked you to look at. You want to make sure that both houses are up to code, are livable, and that you’ll be able to make repairs when the inevitable maintenance issues come up.

House A is pretty typical; the walls are in great shape, the rooms flow well into each other, the paint is evenly applied applied, and the light switches are in places you expect them to be. The appliances all seem to work. The builder has all construction records, and when you ask about why something was done a particular way he either refers you to a section of the building code or gives you a well considered reason.

The builder of house B had different ideas. The wallboard is not well taped, there are holes in walls, there are many patches of missed paint, and there is no moulding anywhere. The rooms are oddly placed (two bedrooms open off the kitchen, and the kitchen only has a door to the outside). All the light switches are located in a closet near the front door (“for convenience,” says the developer). The washer-dryer is in the kitchen, the refrigerator is in the master bedroom, and you think that the dishwasher might be accessible through the crawlspace. The plans for the house exist as crabbed scribbles on a Big Chief pad (“somewhere, where did I put that?”), and when you ask why something was done a certain way you get an answer along the lines of a shrug and an “I don’t know,” or worse, “It doesn’t matter, it works, doesn’t it?”

Right.

“But brace style doesn’t matter,” protested the engineer. “You guys all said that at the beginnging of the project.”

And so we did. But, “Dennis Ritchie can get away with it, but you can’t” is what I should have said in the last really bad code review I did. The train-wreck had started months earlier when the engineer in question had refused to write his components in the same language that the rest of the team was using. This is a Bad Thing — really a management issue — and it got much, much worse. (One of the worst situations you can be in is to discover a very buggy piece of code just before you ship; your choice is to either spend N hours fixing it, or the same N hours re-writing it, and management will never risk that much new code that late in the game, so you’re stuck with it; you get to spend probably N hours fixing it, several more multiples of N maintaining it over the next several releases, and then you might arrange a rewrite. In your spare time).

Of course, you had an inkling that there would be trouble, because the code looked “kind of messy.” Listen to the alarm bells; this is your early warning system kicking in.

Honestly, there are levels at which it doesn’t matter how code is formatted. The compiler certainly doesn’t care. Seasoned professionals can go into anything: random brace style, goofball indentations, a wacky circus of macros, you name it and we can deal with it. That doesn’t mean that we like it, or that it should be permitted to exist. Function follows formatting; as a rule of thumb, messy-looking code has the most problems, is the most likely to be coded to brittle assumptions, and is the most difficult to modify when (when) problems show up.

It’s embarassing, but a frequent answer to the bitter question “Who wrote this crap?” is often “You did.” Meaning that even well-designed and written code written by yourself will be opaque and pretty junky six months later. At a style level, you are not coding for you. You are coding for the next poor schmuck to come along and fix your bugs. Have pity; that poor schmuck is likely to be you, and if it’s not . . . well, leaving a decent legacy behind you is always good practice.

Engineers get bug-eyed and start breathing rapidly when I start ticking off items like putting spaces after commas and around operators, having a consistent naming style for variables, giving each variable its own separate declaration and so on. The minutia of coding style at this level seems to be beneath people, especially (it seems) the junior engineers. They bristle at this kind of treatment. But if you look at the really good coders, the ones who pound out thousands of lines of really good stuff a week, you’ll see that they do pay attention to this kind of thing (and many, many other details, like decent unit tests and documentation, and things that are way above style).

Style is where things start; would you subscribe to a magazine with mispelled words and badly formatted paragraphs? What do errors or uncaring at the lowest levels imply about the higher level content?

Like I said if you’re K or R, you can write any way you want. But if you’re a couple years out of school and God’s gift to the profession of programming, style is not beneath you, and it’s worthwhile taking a look at what the experts are doing.

Next: A diatribe on the naming of names.

Unused Hammers

I have a handful of favorite programming languages that I’ll never ship a product in. For one reason or another, perfectly good programming languages — or ones that are nearly perfect, except for a single tragically fatal flaw — will have to remain on the bleachers while the old standbys like C and C++ trudge up and down the field. It may not be a glorious way to win games, but the old guard works pretty well, most of the time. Still, it’s nice to dream about not having to worry about the usual set of bugs.

Scheme is probably #1 on my list of languages that I’ll never do anything real in. It’s a great little language, well-described, easy to understand, relatively easy to implement, and great for writing abstractions. Modern implementations of it have pretty reasonable performance. But not many people like it, which makes using it on teams difficult, and the I/O system is non-standard enough that portability is a lost cause. Mistrust of garbage collection is fairly uncommon these days, thanks to the popularity of Java. [Though full-tilt blind faith in GC is disturbing for other reasons]. The most common reason why people dislike it? “All those parenthesis.” I think they’re glorious and (with a decent editor) quite liberating, but your average unenlightened programmer doesn’t. Lexical reasons like that are a crazy reason not to use a really neat, simple set of tools. Once you’ve really worked with programs-as-data, you don’t want to go back, especially after you’ve contemplated getting that 500+ term C++ grammer working again (kind of) on your source tree, after you’ve noodled it to work with your compiler’s non-standard extensions….

[It’s certainly possible for lexical messiness to cripple a language. Compare C++ and Perl — complete train wrecks of punctuation, with Pascal or Lisp. Unbelievable.]

SmallTalk is another great system that is going to have to remain on the shelf. First and foremost, noone uses it for serious work. Then, what’s a Smalltalk application? Is it the whole image? Is it just the deltas that you typed? Figuring out the boundary of the app -vs- the rest of the system can be tricky. (It’s possible that it doesn’t matter, that most Smalltalk apps are just an image, and that it’s okay. It’s also possible that the extractors do work well these days — that’s not my understanding of a decade ago).

People that I respect keep telling me to look at the latest version of Python. I’ll do that someday soon (though the idea of indentation-as-scope still gives me the heebee jeebees; chalk it up to emotional scarring caused by Occam).

I don’t know what else; maybe O’Caml would be good to play with. And I’ve got this BCPL compiler hanging around somewhere; it’s nice to know that everything really is an integer underneath it all… 🙂

Planning for Debugging

One of the things that seems to limit deployment of new technology is our ability to debug it during bringup. I’ve seen a number of projects fail because they didn’t plan for the inevitable “we’re fixing bugs” stage of development. Planning like: You’re going to write a new operating system on top of a new hardware platform, but management is too cheap to invest in logic analyzers for the developers; six months later things are slipping week-by-week as new bugs crop up, and the fix rate sucks (whereupon you ship something flaky, and the market buries you). Another classic losing strategy is to make a system that is impervious to debugging (I once heard “What’s wrong with printf? Your download time is only 45 minutes” from a chip designer — to be fair, his turnaround time was weeks or months, but he had simulation resources that the software guys didn’t).

Why is this so bloddy hard? We keep making mistakes, and this isn’t going to stop happening. Program proving and rediculously strong typing and so forth aside, we’ll continue to blow it in one way or another, and the more tools we bring to bear, the more clever our bugs are going to be. Sometimes the mistakes are great mistakes — ask any old fart about the time it took two weeks to find a problem that was a one-line fix. These stories are true.

Things I have used in debugging: Debuggers, when available. No debugger, when the available debuggers have been worse than the disease (MPW, augh). The venerable printf, or its equivalent. Logs of all kinds. Whole machine state that you get from a machine through a serial port (tap foot for several hours). Nano logs written to memory (to catch race conditions). Talking the problem out with a cow-orker. Flashing LEDs. Speaker beeps. Floppy disk track seeks. An AM radio held next to the machine. A photocopy machine (to get a permanent record of a register dump from the face of a hand-held machine). Armies of borrowed machines (hoping to hit a rare timing condition by enlisting the masses). Staring at code for hours. Cans of coolant spray. Heat guns. Logic analyzers. Shipping anyway (sigh).

One of the best ways to learn debugging is to sit right next to someone who’s doing it and to get a running commentary on what’s going on. I learned a lot from the Mac guys; they’d grab an armful of ROM listings and start merrily tracing away into the operating system. I never truly appreciated Macsbug until I saw someone use “magic return,” then the lights went on. The windows kernel debuggers are similar in nature — these are user interfaces for what’s going on at a very low level. Source level debugging usually lie to you (well, that’s why they exist), and when you need to understand exactly what’s going on, you need something geeky. I don’t know what operating systems we’ll be running fifty years from now, but there will always be a kernel debugger with a bunch of two or three character commands, and a geek culture around them.

I’ve been doing some reading on the Itanium architecture. It’s a superscalar VLIW-in-your-face machine, with lots of the memory system and caching details exposed to the application-level programmer (e.g., you can issue speculative loads, and then use them or ignore them later). The runtime system looks insanely complex; this is not your daddy’s vanilla flat-address-space-with-lots-of-registers RISC processor. My guess is that if Itanium falls flat on its face in the market, one of the reasons is going to be that only a few thousand people truly understand it and can diagnose and fix low-level problems. This doesn’t exactly help new hardware development.

Invest in debuggers. You’re going to need them anyway.

Spam 20 years from now

Future #1

“Oh my God, I just got some spam!”

“Jeez, this has been a bad month. That’s like, four so far?”

“Three.”

“Still pretty bad.”

“Let’s see if . . . yes, the Visa cops have already arrested him, see the video?”

“Wow, is that guy in trouble. Look, they’re de-chipping him. I know he’s a bastard, but that is pretty severe.”

“My grandpa says that he was getting thousands of spams a day, once.”

“Riiiight.”

Future #2

SCENE: A cold, rainy night in some gritty big-city downtown. The SWAT team is preparing in their van.

CAPTAIN: “Remember, MasterCard is interested in results here. The people with keyboards and voice input are to be taken out first. Higgs, you neutralize the servers. Boson, I want you to nail that router, understood?”

ALL: “Yessir!”

Future #3

“Remember e-mail?”

“What?”

Books again

More Charles Stross: His most recent The Atrocity Archives is a cross of H.P. Lovecraft and Neal Stephenson, reminiscent of the “magic is really technology” (or at least, rationally explainable) theme of Heinlein’s Magic, Inc. and Poul Anderson’s Operation Chaos. Offbeat and inventive, Stross’s references to semi-obscure computing history (e.g., Symbolics Lisp Machines) and theory of computability make great computer geek reading. He’s clearly been in the trenches, shipping software, he’s cynical as hell, and he can write.

In his postscript, Stross admits to reading Tim Power’s Declare after writing _TAA_, which “was a good thing.” Both books deal with the cold war and magic, though Powers leans (as usual) towards the hand-wavy and unexplained mystical side of things. I guess I just like gearhead space opera.

Stross’ next book, Iron Sunrise, will be published in July. It’s a sequel to Singularity Sky.

Where’s a tree-loving Ent when you really need one? Stephen R. Donaldson is penning another (“last”) chronicle of Thomas Covenant the Unbelieving Whiner. Years ago and sick in bed, I made the mistake of reading the first book in the _TC_ series. I might have read the first three books, but have since blissfully forgotten all but the low points. Namely: A character who never grows up or gets a good attitude, a land of peasants where no one seems to grow food, and a creeping, miasmic evil that honestly doesn’t seem as bad as the heroes. (“Hey, can we vote on this whole good-evil thing? Who decided who was who?”) I guess we can thank our lucky stars that there will only be four books, and they’re going to be the last ones Donaldson ever writes on the subject because the title says so, and writers never lie about that kind of thing.

King’s The Dark Tower VI is also felling trees next month. This is another case of, “I liked the first book a lot, too bad he couldn’t wrap it up in the second.” Why does everyone have the seven book series disease? Did word processors do this to us, make it too easy to vomit words? I have this idea for a printing operation; no software, just hot metal and lots of swearing, hand-written manuscripts that if the dog eats they’re gone, and customers who won’t bitch at the odd mis-spelled word.