“The Stack!” cried Mary • Scott's Ramblings

The stack!

… cried Mary, as she stared at yet another stack overflow vomited out of cat-image-api. You nod knowingly, reflecting on your mental model of the stack:

Using too much stack => stackoverflow. Not good.
Sometimes, variables go there!
And also, something to do with function calls?

But … what really is it? This blog is a brief introduction to the native stack - the one you get in compiled languages like C or Rust, without a runtime in between - as well as backstory for some coming posts. It is intended to be a clear introduction for folks who work in the software industry but haven’t had much exposure to lower-level, ‘systems things’. I’ll explain what the stack is, the problem it solves, and how it is physically laid out in memory.

Let’s dive in!

But first: process memory!

To make sense of the stack, we need to establish a foundation - when we launch an app, what’s it actually look like in memory? This is a real rabbit hole of terror and nuance, so we’ll keep it high level - just enough detail to see the lay of the land so we can move onto talking about the stack itself.

So - let’s imagine we’re about to launch a new process on Linux:

Process Memory Layout at Startup

Program startingStep 1 of 5

The process is about to launch. It receives its own virtual memory layout - process memory addresses don't point straight into the RAM! - and this memory space is empty.

So we can see that the stack is a preallocated chunk at the top of the memory range, growing downwards on most typical OS/architectures, where we can store stack-allocated variables and the extra bits - like return addresses - we need to implement function calls in machine-code land.

A (brief) aside - faulting

One other interesting bit - there are two different memory-related faults that (I think) are worth knowing about:

Accessing memory that isn’t mapped at all → SEGFAULT. This happens when a program touches an address it has no valid mapping for, for example if you recurse so deeply that the stack pointer wanders off the end of the stack region, or if you try to execute code in a region marked non-executable. The CPU raises a pagefault exception, the kernel sees that it’s invalid, and sends your process a SIGSEGV. This’ll typically result in the process crashing dramatically.
Accessing memory that is mapped, but not yet backed by physical RAM → handled PAGEFAULT. Here, the virtual address is valid, say part of your heap or stack, but the physical page hasn’t been materialized yet. The CPU still raises a pagefault exception, but the kernel notices this case is legitimate and “faults in” a page of memory before resuming your process. This is how Linux and other modern OSes implement lazy allocation.

And now the stack

So now we know where the stack sits and how it physically relates to the heap, lets see how we use it store things. The stack is logically divided into frames, where each frame corresponds to an active invocation of a function.

There are two CPU registers in play - the stack pointer (SP) points at the top of the stack; to allocate to the stack, we simply advance it by the amount of memory we need, then write in the space behind it. The frame pointer (FP) points to the base of the current stack frame. This isn’t strictly necessary anymore, but makes it easier to for tools like debuggers to unwind the stack - we can start with the FP and walk back down to the root of the stack.

The concrete registers used vary between CPU architectures; for the rest of this post wherever I need to i’ll reference ARM64 - the CPU I have in my Mac.

With all that in mind, let’s have a look at how the stack works, and how it lets us capture context for function calls, and how it lets cheaply allocate variables. Consider this sample code:

int func_one() {
  // The 'volatile' here is to stop the compiler
  // optimising away too much here
  volatile int a = 1;
  volatile int b = 2;
  return a + b;
}

int main(int argc, char** argv) {
  int a = func_one();
  return a;
}

And this visualisation of its stack management, disassembled from an ARM64 binary - note that named registers - e.g. X29 and X30 - are architecture specific:

Stack Execution

Program startsStep 1 of 11

Program starts - main() begins execution. Our program hasn't pushed anything to the stack yet. FP points to caller's frame (libc startup code, not shown).

And - that’s it! We can see how how simply by bumping the stack pointer up and down in a consistent way we have a easy mechanism to both allocate space for locals and implement function calls on our CPUs which have no concept of functions.

I hope this post has given you a good intuition of the problem the stack solves and how it is physically implemented within a running process; I’ve got a couple more posts in the works that build on this foundation!