Memoirs: Sequoia
October 07, 2020When I was preparing to leave Encore, I interviewed at two places: Kendall Square Research and Sequoia Systems. These two form an interesting pair for an unusual reason: they both ended up going out of business because of accounting irregularities (booking demo/eval units as revenue), so my avoidance of one and short tenure at the other was pretty lucky. There's another interesting story behind one of them, and it frustrates me that I can't actually remember which but I think it was Sequoia. Toward the end of my time at Encore (after Sequoia so I'm getting a bit ahead of myself) they were bidding for a large contract at the US Army's Fort Huachuca. The first round of bidding went to IBM, but Sequoia protested. This triggered a second round of bidding, but also apparently annoyed someone in the government enough that they started taking an extra hard look at Sequoia's books. That's when they found the irregularities, so Sequoia had really done themselves no favors with the protest. The real kicker is that Encore actually won the second round of bidding, but then couldn't deliver and got sued for breach of contract. The contract that killed two companies - one by losing, one by winning.
The KSR interview was a bit weird. For one thing, they were very secretive about their technology so it really wasn't clear what their system would be like to work on. For another, they made a really big deal about having catered breakfast, lunch, and dinner brought in every day. This might not seem odd today, with companies like Facebook and Google famously providing more and better dining options, but back then it wasn't common at all. KSR was clearly copying competitor Thinking Machines, which had become notorious for the same thing. When they mentioned this perk for about the third time, I jokingly asked if they had cots in the back too so that employees could stay overnight. They evaded the question, but I later found out the answer was yes. These lavish perks were not provided out of the goodness of anyone's heart. They were provided to make a brutal pace of work tolerable, and that's still true today. Even back then I could sense this, and passed on the opportunity. Fun fact: it also meant I didn't end up working for a company controlled by Bill Koch. Yes, that Bill Koch. In the end I went to Sequoia instead.
While Encore had been oriented first toward scientists and later toward hard real-time customers - the F-117 Nighthawk ran code I'd written, as did many nuclear power plants - Sequoia's schtick was high availability. They used paired processors with comparator logic on a board, but did not use the "pair and spare" approach of competitors like Tandem and Stratus. Any board could take over where any other had left off, not just a designated partner. As they put it, this "N+1" approach was more efficient than others' "2N" approach, though a failure would result in some performance degradation instead of leaving performance utterly unchanged. To support this approach, they used a "shadow memory" approach. On a checkpoint, dirty memory would be flushed first to one set of memory and then to the other, along with state to determine which of the two was valid and current. This allowed another processor pair to pick up at a consistent point after a fault. These checkpoints were very frequent; the kernel code was absolutely littered with them. And yes, this made Sequoia systems damn slow. They also had a lab that would have given any safety inspector fits, with cables all over the floor and open fans everywhere.
But that's not the reason I left after only a few months. As it turned out, my first project was to implement a sort of process isolation very similar to what I'd just done at Encore for the "guest OS" feature. So I knew exactly what was possible and how to do it, but this put me at odds with the chief software architect who thought he knew more than he did about such things. Even though the requirements he kept giving me were impossible and sometimes even self-contradictory, I just couldn't get him to collaborate on designing something that would actually work. Not even when I got my boss, his peer, to intervene. I later found out that he had been similarly disruptive and broadly disliked at other jobs too (including KSR). That interaction plus the general crappiness of the system and its development environment had already convinced me to go back to Encore after only a few months.
Side note: one of the most interesting things I learned at Sequoia was that different "steppings" of the supposedly-same processor can have very different behavior. This showed up as a crash when the 53 and later steppings of the 68040 that we used would push a different number of bytes onto the stack for a certain exception than 52 and earlier (making up the numbers because I don't really remember). Wasn't hard to fix, but this taught me to read errata very carefully.
As it turns out, I got back to Encore just in time to be laid off ... or maybe not. That will be the third and final part of my Encore story.