FIFO Madness

I’ve never seen a FIFO that worked. Period. Every piece of hardware I’ve had to write a driver for has had a buggy FIFO.

A FIFO, for those of you fortunate enough not to know, is a hardware gizmo that buffers up bytes between a source and a destination. FIFOs are used a lot in situations where you temporarily need to store a few extra bytes because the source and destination data rates don’t exactly match. For instance, disks and network controllers like to “dribble” data back and forth, while memory systems work most efficiently in bursts; this is a clear mismatch, and often you stick a FIFO in the middle to deal with it.

Imagine you’re a software guy and it’s your job to make a disk driver for a new piece of hardware. The first thing to try is to just read a sector from the disk. So you go flipping through the hardware documentation and find that you need to set up a transfer address, a transfer count, a transfer direction, and then an offset adjustment fumbleguzzle, followed by a “Go!” bit, then stand back and wait for the completion gortwibble.

Digging further, you find that the offset adjustment fumbleguzzle is computed by taking the 1’s complement of the modulo-eight-byte transfer size added to the ending transfer address. The completion gortwibble has you confused, until you realize the interrupt arrives through a spare line on the sound chip. Fine. You check your prototype hardware board and verify that you have the right collection of blue, green, yellow and purple-with-black-stripes wires. You’re sittin’ pretty, and it’s just a matter of slamming out the right code.  How hard could it be?

Now, you’re a veteran of several chip wars, grizzled and tough and you eat hardware guys for breakfast, and you don’t believe anything you read in documentation. So you write something simple just to tickle what the hardware docs claim. It’s easy enough: Stuff some hardware registers and slurp sector zero off the drive. Go!

The system hard-locks, and the only way to get it back is to remove the power supply. Huh. So you single step through the code to find out where you blundered, and it works great. Sector zero lands in memory, right where it should, all happy and sparkly and wondering what the fuss is about. Fuck.

Through an afternoon of trial-and-error, you find out that there has to be a few instructions’ delay between the setting of the DMA address and the transfer count, or else the transfer count is set to “lots” and the DMA engine happily wipes out memory at DMA speeds; when the wavefront of DMA-driven destruction reaches your debugger’s stack it’s Lights Out. Furthermore, there were lies (lies! imagine that?) told about the completion gortwibble, and the interrupt needs to be edge-triggered, not level-sensitive, though the latter is all the cheap-ass sound chip is capable of. This will never work. You’re going to need more fancy-colored wires on that board.

So you follow official channels and send out a memo asking for an ECO (“engineering change order”), there are meetings and public floggings, and you finally get hardware that works, handed to you by a humbled and now very quiet hardware engineer who might, might get his soldering iron back if he’s on good behavior for the next six weeks. Ha ha. No, not really. What actually happens is that you knock on the side of hardware guy’s office door (these guys get offices, apparently) and explain the situation re gortwibbles and interrupts. You get a blank stare. Okay, maybe you made a stupid mistake. You explain more slowly and enunciate very clearly, waving your hands in wide, slow gestures, pantomiming DMA and register values, just as Americans do in foreign countries when they are asking well-armed locals where the Consulate used to be or maybe where they keep the blue-and-white wires. You feel like an idiot. You feel even more like an idiot when Blank Stare forwards you the email (CC’d to the whole hardware team, sales, marketing and Usenet, but nobody on the software side at all) explaining how the fumbleguzzle and gortwibble registers were designed-out months ago and replaced with an integrated, 67-bit-wide cobwolly register, and the interrupt in question doesn’t exist anymore (now, there are six of them to deal with. Erm).

“You’ll get that hardware next week.”

“What do we have now?”

“Those are the Rev B chips. They had lots of problems. Why are you even using those?”

Please note that I haven’t even gotten to the subject of FIFOs yet, and because I’m beating my head against the usual concrete post in the software area I’m not sure I’m going to stay conscious that long.

– – – –

So imagine (just for kicks) that you’ve got the disk system happily transferring bytes back and forth, but then you get reports that occasionally people are seeing some corrupted bytes. “That’s my data,” says one person, “but it’s shifted by one byte here.”

FIFO madness.

A FIFO is like an accordion; it fills up with bytes from the disk, but the memory system isn’t ready yet, so the FIFO has to hang onto them, getting more and more full, until finally the memory asks for “Bytes, and lots of ’em!” and (phew!) the FIFO deflates and starts filling again.

But the memory system is picky, and won’t accept bytes unless they are on a 4-byte boundary, so for unaligned accesses there are wacky start and end conditions. Standard textbook stuff, and they cover this stuff in every school’s design course, every school but the one that your hardware guy went to, that is. In Outer Gonzonistan they use a method handed down from The Ancients by generations of Village Elders, involving six bits specifying an arcane rotation-and-mask after mixing in the blood of a software —

“Oh God,” you cry, “Save me from this living hell.”  Because as you fix one bug involving edge conditions, something else undocumented sticks its prairie-dog ass into the air and hoses you down with poo.  Adjust the count for alignment, except you have to make sure things don’t cross a 4K boundary, and let’s not get started about the DMA engine hitting cirty cache lines or the wicked timing problems involving DRAM refresh.

This is about the time that the hardware manager approaches your own manager and accuses you of not being a team player. “My guys have been designing these FIFOs and catching nothing but flack from your software guys,” he says over his hardware-guy-class matching belly and beard.  “And why isn’t that disk driver done yet?”

Your manager explains that the hardware is buggy. This is about the time that the Director of Hardware approaches the Director of Software. “I understand that some of the people on the software team are not being Team Players, and that the software is behind schedule while the hardware people are thumb-twiddling.”

Your own Director says she’ll look into it. Three minutes later you’re pinned against a wall in the parking garage, staring into the barrel of a sawed-off HR violation and stammering reasons why you shouldn’t just be launched into oblivion and write, say, video games for a living.

Ever dealt with a GPU hang?

Some assembly required

A little (very little) more history.

I spent more time at Atari waiting for assemblies to finish than you’d probably believe. I mean, assembly language; how hard can it be? Yet the Assembler/Editor cartridge was famous for its lack of speed, the cross-assemblers on the MV/8000 could take 45 minutes to crunch through 16K of output during loaded hours, and even the CoinOp assemblers on the Vaxes were not remarkably fast.

My roommate and I found the Synapse Assembler on the 800 to be incredibly fast. It would process 8K of output in just a few seconds. Combined with a 128K RAMDisk, and a parallel cable to a slave Atari 800 and a debugger, you could turn around a piece of code in a couple of quick keystrokes. I wrote a (very) tiny Emacs patch for the SynAssembler, and for a few months we were in fast turnaround heaven. It almost doesn’t matter what language you’re working in if the turnaround time is quick enough.

I started writing assemblers as a hobby. I hated the slow tools we had and really wanted something better. Pre-tokenization sped things up a lot. I got some other people to actually use my second or third efforts at “really fast” assemblers, and got some good feedback (e.g., when I added a listings output feature, people started taking the assemblers seriously — there’s something about hexadecimal numbers on 132-column fan-fold paper that gives assembly programmers a warm fuzzy feeling).

Things (like the company nearly going belly-up, and the cessation of 6502-based development pretty much everywhere at Atari) intervened, and I didn’t return to that hobby for a couple of years.

The 68000 assembler we used for the Atari ST was really intended to be used as a back-end to a C compiler. It had very few creature comforts; no macros, no includes, no real listings mode or cross-reference generation. Writing assembly in it was moderately painful; doable, but not fun. It was also not very fast.

So I got pissed off at it and wrote MadMac. Mission #1, be a decent tool for writing assembly (macros, etc.) because we were still writing at that level a lot in those days. Mission #2, be fast. So MadMac uses some smart buffering (it tries hard not to copy a string out of the disk buffer unless it has to), uses DFAs to recognize keywords, boils input text down to easily processed tokens as early as possible, and so on. I’m sure it could be faster (just as I’m sure there’s plenty of too-complex premature optimization), but it was pretty good for its time (I remember measuring it at 50,000 lines/minute on an 8Mhz 68000, but it’s possibly my memory is exaggerating things).

But MadMac has a 6502 mode. WTF? Who ever heard of an assembler doing both 68000 and 6502 code generation?

Around the time I was finishing-up MadMac, unbeknownst to the ST software group another group had hired a contractor to do some work on a new development system; I think it was for the 7800 console, but it might have been another project. Some 6502-based thing, anyway. I noticed this guy’s printouts in the machine room and couldn’t resist leafing through them; he had finished the design of a pretty vanilla 6502 assembler and was starting to write code. His partially completed work included pretty much all the stuff that I’d already done in MadMac, but his stuff wasn’t as good (his assembler was going to be slow, and he’d made some bad compromises in functionality — no macros or listings, for instance).

I got mad that we were paying someone for months of work that I could a better job of in like a week. So MadMac got a 6502 mode, I cost a contractor his job, and I guess it saved Jack Tramiel some thousands of dollars. Later I heard that the people using MadMac were mostly using it for 6502 development, and that they loved it.

—-

Today, for the most part you can just hack away in Java or C# or C/C++ and not worry about the underpinnings of things, but when it comes to the performance-sensitive bottlenecks of modern systems, out come the assemblers.  For a “real” OS there’s always more of it than you think, and for modern systems things can get pretty complex.  We had a decent macro-assembler for the Apple Newton that made the kernel development tons easier, and I’ve seen other systems since then that have more assembly language than you’d expect.  Assembly is still relevant and it makes sense to have decent tools at the bottom.  [I get a chuckle out of people questioning whether C is still relevant . . . little do they know…]

Desperado

Punchcards suck.

I used to use Emacs at 300 baud. That’s how desperate I was to use a decent editor. The courses I was taking at school were on punchcards, and to avoid those horrible things, I:

– Wrote a terminal emulator for the SUPDUP protocol supported by ITS;

– Logged in over the Arpanet at 300 baud to MIT-AI;

– Used Emacs to edit my project’s source code (in Pascal);

– FTP’d that source to the machine at work (NBS-10) where I compiled and ran it;

– Picked up printouts and handed them in.

For the final project I drove up to MIT to visit my frind Jack (who was going there), got a listing off of the XGP laser printer, and handed *that* in.

My home-built Z-80 system didn’t even have a UART; the serial bits were programmed in and out with timing loop (interleaving screen updates).

That’s how much punchcards suck.

More archeology: Atari MadMac assembler sources

Digging through some old disks I found the source to an old version of MadMac, a fast assembler I wrote just before I left Atari. It needed some TLC ( some of the tables were missing, and I had to recreate them). It needs more.

It compiles under Ubuntu, though not without errors. There are makefiles for MSDOS and the ST, but they almost certainly do not work. I have not done much testing other than to make sure that the included, simple files assemble without errors (I didn’t check the output). This version is almost certainly missing some fixes that Alan Pratt made after I left Atari. If any work was done on MadMac to support (say) Jaguar or the 68020 then that work will not be reflected here.

I suspect that maybe two or three people on the planet will find this interesting. Frankly, I’m embarrassed by the poor quality of the code. If you find this useful, that’s great. I might be able to answer questions, but mostly I’ll probably say things like:

“Yup, that looks broken.”

(pregnant pause)

“Nope, I’m not going to fix it.”

Link.

[Edit: Looks like “.rept” is broken, and I’ll bet that macros are, too.  Assembling clear.s, you can see that the “.rept 8” block of movem instructions isn’t expanded.  I’m unlikely to do anything with this, so that’ll be one of the first things you’ll need to fix… Have fun! 🙂 ]

Atari Basic internals

I’ve linked to the middle, the meat, of How Atari BASIC Works.  I really like the way that they crammed a pretty decent BASIC interpreter into 8K or 10K (depending on how you count) of ROM.

In comparison, the first BASIC interpreter that I saw that was written in C was a bloated, over-implemented disaster.  (What’s “over implemented”?  It’s when someone decides that “bullet point” features are more important than memory usage, or interpreter efficiency, or usability.  For instance, you wind up with thirteen ways of opening files, or Format statements that output Roman numerals, but the built-in editor stinks and the floating point arithmetic is glacial).

I’m not saying “Go back to assembly.”  I’m saying, “Pay attention.”  To pick some technologies at random: There’s something that makes burning cycles making SOAP or X-Windows faster that’s evil, while the same cycles used to make SmallTalk and maybe Ruby better should be spent with glee.