In my last post, we explored the native stack and how it manages function calls and local variables in compiled languages. I elided a bunch of other detail with some hand-wavey “here be dragons!” talk. In this post we’re going to dive into one of those details - now that we’ve got the stack and a way to jump into and return from functions, how do we pass their arguments and return values?
Functions can take all sorts of things as arguments - big, and small - as well as essentially an arbitrary number of them. So - where do we put them?
Let’s dive in!
Calling Conventions?
The very short answer is we follow a calling convention - a description of how we can use CPU registers and the stack to encode function arguments. When we share compiled code between libraries and languages we need them all to be callable in the same way so that the code plays together nicely.
We’re going to be looking at AAPCS64 (the ARM 64-bit calling convention), if only because I have a Mac handy. If we ignore Windows 1 the other popular ABI these days is System V AMD64. In practice these are both very similar, and for this post, I want to give you a feel for how they work in general, so the differences aren’t so important!
Calling Conventions!
No arguments
Let’s start at the absolute simplest case: a function that takes no arguments at all.
To the side is the disassembly so we can dive into the calling convention; don’t be horrified — it’s straightforward once you know what you’re looking for. If you’re playing along at home, you can use objdump -d to produce this from a binary yourself. Note - this isn’t the whole diassembly, just the bits that are involved in the call.
// ...int z = no_args();; Call no_args(); bl — "branch with link" — jumps to the target function; and saves the address of the next instruction (the return point); into the link register (x30 / lr).bl 0x1000004b0 <_no_args>
; no_args runs
; Store the return value from w0 into the stack slot reserved; for the local variable 'z' in main's frame.str w0, [sp, #0x94]Now, let’s have a look at no_args itself - how’s it returning that value?
int no_args() { return 0;}; Place the literal constant 0 into w0.; w0 is the 32-bit view of x0 — the designated return register.; The constant 0 here is an *immediate*, meaning it's encoded directly; in the instruction rather than loaded from memory.mov w0, #0x0
; Return from the function.; ret jumps to the address stored in x30,; which was set by the caller's `bl` instruction.retThere’s a bit to take in here, but it establishes the baseline for our calling convention:
- A register is used to store where to return control to when a function completes (
x30) - We physically call the function by jumping execution to the function’s address in the application’s memory
- A register is used to store the return value from a function (
x0) - On AArch64,
x0-x30are the 64 bit general purpose registers, andw0-w30are just their lower 32 bit halves; we often see these mixed into the same assembly
That’s it - we can see how we move control between functions and how we return a value. So - what about passing arguments?
Simple arguments
Let’s start with simple arguments:
To the side is the relevant bits of the disassembly; note that as we go along, I’m omitting bits that we’ve covered above, and focussing on what is new:
// ...int mixed = simple_func(1, 2, 3.0, 4.0);; Store arguments into registers per the AAPCS64 calling convention.; Integer arguments go into x0, x1, x2, up to x7.; Floating-point arguments go into d0, d1, d2, up to d7.
; Store first integer (1) argument into x0 register.mov x0, #0x1
; Store second integer (2) argument into x1 register.mov x1, #0x2
; Store first floating-point (3.0) argument into d0 register.fmov d0, #3.00000000
; Store second floating-point (4.0) argument into d1 register.fmov d1, #4.00000000
; Call simple_func(a, b, x, y)bl 0x1000004b0 <_simple_func>
; simple_func runs
; Store the 32-bit return value from w0 into the stack slot; reserved for the local variable 'mixed'.str w0, [sp, #0x94]Now, let’s see what simple_func does:
int simple_func(long a, long b, double x, double y) { return (int)(a + b + (long)(x + y));}; Read arguments from registers to the stack.; We've compiled with no optimisations (-O0), so the compiler chooses to; "materialise" every C variable on the stack — the C standard; requires that their addresses be available,; which prevents the compiler from keeping them only in registers.;; This is helpful for debuggers! We can see the value of any argument; when we break in the function body, even if the register transporting; it into the call has been reused.
; Make 32 bytes (0x20) space on the stack to store the args to the stacksub sp, sp, #0x20
; Then store them to the stackstr x0, [sp, #0x18]str x1, [sp, #0x10]str d0, [sp, #0x8]str d1, [sp]
; Read arguments back from the stack into registers,; then add them together.ldr x8, [sp, #0x18]ldr x9, [sp, #0x10]add x8, x8, x9ldr d0, [sp, #0x8]ldr d1, [sp]fadd d0, d0, d1fcvtzs x9, d0add x8, x8, x9
; Move the result into x0, ready to returnmov x0, x8
; Restore stack pointer (epilogue)add sp, sp, #0x20retA bit more to take in here, but straightforward enough:
- Small function arguments go into registers
- Different registers are used for integer and floating point values
- We have 8 registers of each type
An obvious question is - what happens when we run out of registers?
Lots of arguments
Let’s work it out by looking at how we can call func_ints, a function
that takes 9 integer arguments:
// Conspicuously has 9 arguments!int result = func_ints(1, 2, 3, 4, 5, 6, 7, 8, 9);; Store the first eight arguments into the registers x0-x7mov w0, #0x1mov w1, #0x2mov w2, #0x3mov w3, #0x4mov w4, #0x5mov w5, #0x6mov w6, #0x7mov w7, #0x8
; Load the ninth argument (9) into a temporary register.; We've run out of argument registers (w0-w7 are all used),; so we need to pass this argument on the stack instead.; w8 is just a scratch register - not part of the argument; passing convention - we're using it temporarily to hold; the value ...mov w8, #0x9
; before ultimately storing into onto the stack!
; The ARM64 AAPCS64 calling convention requires; any additional integer arguments to be passed on; the stack above the current SP.;; This conceptually belongs to the frame of the caller,; with the callee reaching below the bottom of their own; frame to read it!str w8, [sp]
; Call func_intsbl 0x1000004b0 <_func_ints>
; func_ints runsNow, let’s see what func_ints does:
int func_ints(int a, int b, int c, int d, int e, int f, int g, int h, int i) { return a + b + c + d + e + f + g + h + i;}; Allocate stack spacesub sp, sp, #0x30
; Read the ninth argument from the stack (below our frame); Note: it's at [sp, #0x30] because the caller put it above; the SP before calling usldr w8, [sp, #0x30]
; Store all arguments to the stack (materialisation for -O0)str w0, [sp, #0x2c]str w1, [sp, #0x28]str w2, [sp, #0x24]str w3, [sp, #0x20]str w4, [sp, #0x1c]str w5, [sp, #0x18]str w6, [sp, #0x14]str w7, [sp, #0x10]; Store w8 (which we loaded from the stack-passed 9th arg above); Note: It looks like w8 was passed as a register, but it wasn't!; We read it from the caller's stack frame into w8 for convenience.str w8, [sp, #0xc]
; Load arguments back from stack and add them togetherldr w8, [sp, #0x2c]ldr w9, [sp, #0x28]add w8, w8, w9ldr w9, [sp, #0x24]add w8, w8, w9ldr w9, [sp, #0x20]add w8, w8, w9ldr w9, [sp, #0x1c]add w8, w8, w9ldr w9, [sp, #0x18]add w8, w8, w9ldr w9, [sp, #0x14]add w8, w8, w9ldr w9, [sp, #0x10]add w8, w8, w9ldr w9, [sp, #0xc]add w0, w8, w9
; Restore stack pointer (epilogue)add sp, sp, #0x30retSo we can see that once we exhaust the scratch registers allocated for passing arguments w0-w7 we store
additional arguments on the stack, in the frame of the caller. This is called spilling.
It should be clear now why we need a calling convention - there’s no “universally obvious” way of describing how a function should be called - we’re balancing performance - registers are much faster than memory - against the use of limited resources.
A calling convention also defines who must preserve register values across a call — for instance, on AArch64 per the AAPCS64 convention:
- Caller-saved registers may be freely overwritten by the callee; the caller must save them if it cares.
- Callee-saved registers must be restored by the callee before
Small structs
But what about when we pass a struct? Let’s look at func_small, which takes and returns a small structure by value.
typedef struct Small { int a; double b;} Small;
// ...Small s2 = func_small(s, 5);; Load the first argument (the struct s) into x0/x1.; Structs up to 16 bytes in size are passed entirely in registers.; The fields are simply laid out across consecutive registers.; Our Small struct has an int (4 bytes) and a double (8 bytes) = 12 bytes; total, or 16 bytes with alignment padding, so it fits in two registers.ldr x0, [sp, #0x80]ldr x1, [sp, #0x88]
; Move the integer argument (5) into w2, the next available argument; register. Note that 'w2' is the lower 32 bits of the 'x2' register -; we're only storing an int and not a double, so we don't need more!mov w2, #0x5
; Call func_smallbl 0x1000004b0 <_func_small>
; func_small runs
; The return value (another Small) is 16 bytes, so it also comes back; in registers. x0 and x1 now hold the new struct fields, which the; compiler stores to the stack.str x0, [sp, #0x70]str x1, [sp, #0x78]Now, let’s see what func_small does:
Small func_small(Small s, int x) { s.a += x; s.b *= 2.0; return s;}; Allocate stack spacesub sp, sp, #0x30
; Store the incoming struct fields from registers to stackstr x0, [sp, #0x10]str x1, [sp, #0x18]str w2, [sp, #0xc]
; Load s.a (int) and add x to itldr w9, [sp, #0xc]ldr w8, [sp, #0x10]add w8, w8, w9str w8, [sp, #0x10]
; Load s.b (double) and multiply by 2.0ldr d0, [sp, #0x18]fmov d1, #2.0fmul d0, d0, d1str d0, [sp, #0x18]
; Copy the modified struct from the stack - [sp, #0x10]; to another location in the stack - [sp, #0x20], via; the q0 register.;; This copy is redundant - we could load directly into x0/x1; for return - but we've forced optimisation off, so we see; things like this!ldr q0, [sp, #0x10]str q0, [sp, #0x20]
; Load the struct from the copy location into x0/x1 to returnldr x0, [sp, #0x20]ldr x1, [sp, #0x28]
; Restore stack pointer (epilogue)add sp, sp, #0x30retWe see:
- Small structs are passed by, and split across, registers
- Small structs are returned in registers in the same fashion
- “small” means 16 bytes or less
Sooo … what happens if we try pass a big struct?
Big structs
Let’s try:
typedef struct Large { char data[64]; int len;} Large;
// ...Large l2 = func_large(l, 42);; Load the address of input struct 'l' into x0.; Large structs (>16 bytes) are passed by reference — the caller; provides a pointer rather than copying into registers.add x0, sp, #0x9c ; x0 = address of 'l' on the stackstr x0, [sp, #0x50] ; spill to a temp slot (-O0 artifact)
; Reload the pointer (redundant, but we've disabled optimisations)ldr x0, [sp, #0x50]
; Load second argument (42) into w1mov w1, #0x2a
; For large struct returns (>16 bytes), the caller allocates; space on the stack and passes its address in x8.sub x8, x29, #0xa0 ; x8 = address of 'l2' (in caller's stack frame)
; Call func_large(l, 42)bl 0x100000620 <_func_large>
; func_large writes the result into the address in x8There’s a great deal of shuffling things back and forth between registers and stack in the unoptimised assembler of func_large, so I’ve omitted it in the interests of brevity.
The rules for big structs are already clear:
- The caller allocates space for the result and passes its address in
x8 - The callee writes the result there directly, then returns normally
- if a struct exceeds 16 bytes, we pass a pointer to the data in the stack, in the caller’s frame, instead.
Footnotes
-
I find ignoring windows to be generally a good practice. Windows 11: what even is that? ↩
