Memoirs: Gluster at Facebook
October 27, 2020This is both the easiest and the hardest part of my memoirs to write. It's easy because everything's fresh in my memory. It's hard because it's the nature of memoirs to be very subjective, and there are still lots of people likely to read this who will have interpreted the very same events quite differently than I did. There's a high risk of offending someone, which I really don't mean to do, so I have to select my stories and choose my wording more carefully than I have been. I'm also going to break this up into two parts, to cover the two projects I worked on.
Important Note: I am not going to get into the whole "Facebook as a socio-political phenomenon" thing right now. Some day I will. I definitely have many thoughts about that, but for right now I want to stay focused on my own personal experiences as a guy down in the trenches doing technical work.
As I mentioned previously, I had already met some of the folks working on Gluster at Facebook and become familiar with their work while I was still at Red Hat. I don't know for sure, but when I reached out about the possibility of "jumping the fence" to work on Gluster for Facebook instead of for Red Hat I got the feeling that people on that side had already had the thought and probably refrained out of a desire not to "poach" employees from a partner. Pretty honorable IMO. In any case, things started to move very quickly. I remember one call while I was at FAST nearby, and another while I was at the ski club. Yes, I did have to go through the standard FAANG interview gauntlet. As far as I know everyone does. Maybe not in the case of acquihires, but if there are any other exceptions they're probably way above my pay grade. In any case, I got what seemed like a very generous offer by all of my previous standards, and showed up for five weeks of "boot camp" in April of 2017.
A few things immediately became apparent to me at boot camp. First is that I was far older than most. Out of 200+ people in my class, maybe one or two were in my own age group. Individual contributors at Facebook skew very young. Less so among managers, but of course there are an order of magnitude fewer of those. Second is that I was "hard preallocated" to a particular team. Most people are hired without knowing what team or even what computing specialty they'll work in, and part of the standard boot-camp experience is to try out different teams. In my case I used the time to work on minor tasks for the teams I knew would be my "customers" after I graduated. Third, I hadn't realized just how rare it was for people at Facebook to work remotely. I know it seems funny now, when everyone's doing it, but back then I was part of a very small group - dozens, out of thousands in engineering. I can't take credit for any particular negotiating skill on that, though. I just made it clear right at the start that I would not entertain any offers that involved moving, and nobody ever questioned it. The understanding was that I'd visit about one week a month, which was fine, but not be pressured to move.
The way Facebook ran Gluster was very different than what I had seen at Red Hat. For one thing, there were many clusters, relying heavily on automation to make the whole thing manageable. Each of those clusters was probably bigger than any Gluster installation anywhere else, and in aggregate ... well, it was just a whole different world. We had multiple orders of magnitude more Gluster capacity than anyone else. This was completely consistent with why I had joined Facebook in the first place. I wanted to experience not only higher scale, but closer operational proximity to that scale. I'd become a bit of an "architecture astronaut" at Red Hat, uninvolved with the day to day effort of running Gluster. At Facebook I took my turn on call - turning up and decommissioning clusters, responding to incidents, answering user questions. It sucked, but I'd been around long enough to consider it a necessary grounding exercise.
Another difference in how Facebook ran Gluster is that they used NFS via their own proxy layer as their primary interface to the system. I wasn't sure about that, since I'd always been a strong advocate for Gluster's own native protocol. This is where being closer to things operationally changed my perspective. The main reason for using a proxy layer is that we could control when those proxies were updated. We did have native-protocol users as well, but they were a massive pain in the ass because we had no control over when they'd stop running older buggier versions. It might seem like an organizational rather than technical issue, but it's important to realize that sometimes organizational concerns should affect technical decisions.
The last difference in "Facebook Gluster" vs. "vanilla Gluster" was largely my own doing. There was a new hardware platform coming, which would have more disks per host and not have a RAID controller. There was a totally legitimate desire to get RAID out of the system, both because nobody liked having us run on special hardware and for efficiency since we already had 3x replication on top of that. The problem was that the new machines would expose many more volumes to the rest of the system, and Gluster couldn't handle that many volumes. Amusingly, the "brick multiplexing" that I had worked on as my last project at Red Hat for a very different purpose would have helped with some of that, but that wasn't available because it wasn't in the version of Gluster we were running and upgrading (or backporting) was a major project in itself. So a colleague of mine (hi Shreyas!) came up with the idea of a "JBOD" (Just a Bunch of Disks") layer to make all of the disks on a host appear as one to the rest of the system and distribute files among them internally. This would bring the number of visible volumes per machine back down even further than before. He started the project, but I ended up writing most of it, using the same consistent-hashing approach that Gluster already used to distribute files across hosts plus a few extra tricks. This ended up being a key part of how Gluster works at Facebook scale even today.
I really enjoyed my time working on Gluster at Facebook. I was both applying my previous expertise and learning a lot, which is a great combination. I loved the people I worked with. My monthly visits were not a burden at all; they were something I actually looked forward to. But all good things must come to an end. Gluster had always struggled to win acceptance within Facebook, and there had been pressure to deprecate it since before I joined. There were some good reasons for this - efficiency (both machine and operational costs were high), poor fit with other Facebook infrastructure (natural since it wasn't written at Facebook), and general complexity/fragmentation among our storage offerings. There were also bad reasons, such as the general anti-POSIX bigotry that I've written about before. What always galled me was that the people expressing such bigotry, or other infrastructure groups complaining about having to make accommodation for Gluster, were often doing so to decision-makers in Seattle while members of the POSIX team itself - all but me in Menlo Park - weren't included in the discussion. It's hard to present your side of a debate when you're not even told there is one. Getting rid of Gluster might have been the right thing to do, but the way the anti-Gluster campaign was conducted always struck me as pretty rotten.
Whatever the reasons or process, the decision was eventually made to deprecate Gluster as much as possible, restricting it to the very narrowest of use cases where no alternative could be found or even built in reasonable time. In particular, no new development efforts, e.g. to add features or improve performance, were going to be made, and that meant no developers. The production engineers, who were actually very competent developers but officially a different role and reporting structure, would stay with the team. The "SWEs" (SoftWare Engineers and yes it rhymes with "squeeze"), including myself, had to find new jobs within Facebook or leave. After almost a decade with Gluster, I was done.