Flash memories

The Mars rover Spirit has had a bit of trouble with its flash memory file system. This is amusing (to me) because I have a fair amount of experience in just this area, which has triggered a rather geeky reminisence about technology, solving the real problems of customers, economics and success in the marketplace. All this from some stupid bug in a file system 30 million miles away. 🙂

It was 1992 or thereabouts, in Cupertino. I had just joined the Newton team, was working on its data storage code, and there were some worries.

Anyone who’s stuck a disk into a computer and then forcibly removed the disk before the computer was ready to give it back has probably had Bad Things happen to their data. Anyone who’s shut down a computer by, say, pulling the plug has probably had Bad Things happen, too. Basically you’re mucking with the integrity of the data; it might wind up half-written, be subtley or catastrophically altered, or it could just get wiped out. Entire industries have been built around guaranteeing integrity of information, including notable database vendors and data recovery firms. Most modern file systems incorporate some kind of data safety. It wasn’t always the case.

The Newton supported removable PCMCIA cards, and furthermore it was a battery-operated device. The Newton’s object store was internal, but you could also store user data (notes, contacts, drawings, etc.) on the removable cards. The problem we had was how to guarantee that data stored on the cards was totally safe.

Imagine you’re Joe Newton, Consumer (not too bright, since you’ve just plonked down $900 for a Messagepad). You’ve scribbled a note onto your Newton and you turn the machine off. You pretty much expect your data to stay around. No excuses about “Well, the Newton wasn’t ready for that” or “You just have to wait a few seconds before you hit the power switch.” This is a consumer product and people expect it to just work. [No snide remarks, please. I know the Newton crashed in the marketplace — I’ll talk about some of the reasons for that in a minute]. [[And I am finessing some more technical points here, such as what “off” really means]].

There are other nasty things that Joe can do, including ripping the PCMCIA card out of the machine without notice, taking out the batteries, or even just dropping the unit a short distance (which may make the battery contacts bounce, causing the system to reset, possibly in the middle of an update). Since any data corruption can potentially take down the entire internal file system, this is bad news. If you’ve invested hundreds of hours into the data on your unit, having that data go away is Very Bad.

So the Newton has a transaction system built into it. If you rip out a PCMCIA card or reset the unit before data are completely updated, the system will roll back to some good, consistent state with all but the latest changes intact. It’s pretty nifty, and it’s a feature of the product that no one really sees. (If no one sees it, job accomplished!)

I’m going to take this in two directions now.

Rover Madness

Now, keeping track of several megabytes of storage on a PCMCIA card itself consumes memory. The Newton didn’t have a whole lot of RAM to work with — 512K in the base unit, but really only about 30K or so to play with as far as the storage system was concerned. The way the Newton storage system was designed, the larger the external memory, the more internal RAM was necessary to keep track of what was stored in the external memory. If you plugged a truly huge Flash card into the unit (“huge” meant 20Mb or so, in 1994), then a much greater burden was placed on the system’s RAM, which could cause lower levels of the operating system to run out of memory, which in truly extreme cases caused a reboot.

So: Flash card fills up. You plug it into a Newton, and the unit starts rebooting cyclically (reboot, look at the card, fill up RAM with management structures, run out of memory, panic, and reboot again).

Which is more or less what was happening on Spirit (without seeing the source for the Mars rovers, I’m making an educated guess).

Market Forces

Of course, Palm came along and kicked Newton’s butt. And far as I know, Palm didn’t have a whizzy transactional object store keeping the user data safe from catastrophe, they did something much better, mostly not addressable in the software.

They made it very easy for users to back up their data. With a Palm, you plonked the unit into a (supplied) cradle, pressed a button and it got backed up. How hard is that?

A Newton was a huge pain in the rear to back up. No cradle. You had to buy a copy of Newton Connection (it wasn’t included), you had to plug the unit into a serial cable (after locating the connector, which was hidden behind a flap), launch an application, click about a dozen buttons (some on the Newton, some on the desktop machine), and then wait. How hard is that? How often do you bother to back up data? How often do you even bother to use your “MessagePad” to do actual messaging, given the barrier to communication?

If you’ve got a safe backup (and hopefully, many of them), then you don’t need bulletproof guarantees about data integrity. Sure, it sucks to lose the information in the field, but when you get back home you just plop the unit in the cradle and (poof) you’re mostly restored.

This level of ease-of-use takes a whole-product view that some groups are just not very good at. It’s not enough to have smart people doing whizzy technology. And sometimes thinking about the product at a higher level than just “some cool software” makes the job a lot easier.

Palms were a third the price of Newtons, and a third the size and weight. The Palm development environment was a lot cheaper, and you didn’t have to write code in the whizzy but wacky NewtonScript language. NewtonScript was pretty neat, but it was strange, and it was a major impediment to folks who just wanted to port their C code and sell software. The Newton team viewed native code as being dangerous, so there was basically no story for porting any existing code to the platform, which resulted in a paucity of applications. In fact, Apple wanted 1% of developers’ profit on titles — you’d think that Apple would have been very supportive of Newton developers, giving away dev kits and making information public, but instead things were almost hostile.

I think that Apple thought the Newton was too precious to actually sell. And after months of being coy with the technology, folks just wandered away in search of something else they could use as platform. Without dealing with those crazy people in Cupertino.

There are other things that Palm got right that the Newton didn’t (like the handwriting recognition), but for something early and expensive, it doesn’t take much to kill a product line. Combine this with Apple’s classic lack of follow-through on things that weren’t a resounding success, and you have a recipe for failure.

It’s a shame; handled right, the Newton could have become a major player. Instead, Apple squandered its investment and handed the market to companies that didn’t have to sink nearly as much into development costs.

Maybe I’ll talk later about other stuff that went wrong, but it’s hardly interesting. The important less for me was that solving a hard problem is not the same thing as solving the right problem, and I’ve tried to remain pragmatic about things like that ever since.

Author: landon

My mom thinks I'm in high tech.