Smashed Stacks
October 29, 2018In the context of the news that the Linux kernel finally getting rid of variable-length arrays, I figure I should finally write about one of the more difficult bugs I've worked on over the last three decades.
This one was at Revivio, where I worked from 2002 to 2006. We started having these problems with machines sporadically hanging. Unlike a normal "soft" hang, where some things such as low-level networking or gdb debugging via a serial line might still work, these were "hard" hangs. Nothing at all seemed to be responding. Machines would suddenly turn into bricks, requiring a power cycle to get going again. I remember wishing for something like the in-circuit emulators from even earlier in my career: something that would at least let us look at memory no matter how wedged the CPU was. Maybe some sort of PCI debugging card existed to do such a thing ... but wait, we already did have PCI cards in these systems that could get at memory without the CPU, because they all had Infiniband cards. Hmmm.
The way Infiniband works is that you have to register sections of memory before they can be accessed remotely. So I wrote a little utility to do that for the big chunk of memory used by the kernel itself plus the vmalloc'ed areas used by kernel modules. This didn't get absolutely everything (e.g. it didn't include memory used by the page cache) but it was good enough. The companion to this was a utility running on another Infiniband-connected system, which would slurp all of that registered memory and convert it into something that looked like a gdb core dump.
That was a useful tool, but the actual debugging was still to come. Over the next month or so, I collected a few dozen of these dumps. The first thing I'd do was look at all of the task stacks, and I kept seeing some that just didn't make sense. There'd be a reasonable series of calls, then garbage. Or a series of calls that was just impossible, because those functions didn't call each other that way (or at all). For a while, I thought it was some sort of bug in the tools I'd developed, so I'd just move on and look at the next thing. After about the twentieth of these, I started to realize that the same few functions always seemed to appear in these "broken" stack traces. Hmmm. That seemed like a pattern worth looking into.
What was actually happening was that we had a few functions which allocated far too much memory on the stack. People accustomed to programming in user space often don't realize that stack size is tightly constrained in most kernels. Linux makes things even worse by putting the stack for each task right above its task structure, so a stack overflow would corrupt the structures used by the scheduler etc. But in our case something slightly different was happening. These allocations were so big that they were causing the stack pointer to skip right over the task structure and land somewhere in the middle of the next task's stack. Then the culprit would unwind, perhaps even exit completely, and be long gong before the "victim" task would hit the corruption and cause a hang.
The fix was obvious: stop making huge stack allocations. Enforcing that required one more trick. I wrote one more tool to disassemble all of our binaries looking for the particular instruction in each function's prologue that would subtract from the stack pointer to create space for its local variables. This gave me a list of functions with how much each one was allocating, which identified all of the potential stack-smashers - including a few that hadn't even bitten us yet. We kept running that tool periodically, and every once in a while it did save us from a recurrence of the original scenario.
That's a cool story, but here's the thing: that little scanning tool wouldn't have worked with variable-length arrays. The fact that VLAs add complexity is a minor issue compared to the fact that they offer no way to check the size of an allocation before it's done. So even though VLAs were not actually a factor in this debugging story, I'm pretty glad to see them gone.