In computer science
, a calling convention
is a standardized method for a program
to pass parameters to a function
and receive a result value back from it. Calling conventions can differ in:
- where they place parameters and return values (in registers; on the call stack; a mix of both)
- in the order in which parameters are passed
- in name mangling
- in how responsibility for setting up and cleaning up a function call is distributed between the calling and the called code.
Different programming languages use different calling conventions, and so can different platforms (CPU architecture + Operating system). In fact, multiple calling conventions can be used within the same program written using a single programming language (example: __stdcall, __cdecl and __fastcall calling conventions on Microsoft Windows). This can cause problems when writing software that combines modules written in multiple languages, or when calling operating system or library APIs from a language other than the one in which they are written; in these cases, special care must be taken to coordinate the calling conventions used by caller and callee.
Calling conventions on different platforms
The x86 architecture
features many different calling conventions. Due to the small number of architectural registers, the x86 calling conventions mostly pass arguments on the stack, while the return value (or a pointer to it) is passed in a register. Some conventions use registers for the first few parameters, which may improve performance for very short and simple procedures.
Since the PowerPC
architecture has a large number of registers, most functions can pass all arguments in registers; further arguments are passed on the stack, and space for register-based arguments is always allocated on the stack as a convenience to the called function, in case it needs to free up more registers. A single calling convention is used for all procedural languages.
The first four arguments to a function are passed in the registers $a0-$a3; subsequent arguments are passed on the stack. The return value (or a pointer to it) is stored in register $v0.
architecture, unlike most RISC
architectures, is built on register windows
. There are 24 accessible registers in each register window, 8 of them are the "in" registers, 8 are registers for local variables, and 8 are out registers. The in registers are used to pass arguments to the function being called, so any additional arguments needed to be pushed onto the stack
. However, space is always allocated by the called function to handle a potential register window overflow, local variable, and returning a struct by value. To call a function, one places the argument for the function to be called in the out registers, when the function is called the out registers become the in registers and the called function access the argument in its in registers. When the called function returns, it places the return value in the first in register, which becomes the first out register when the called function returns.
The System V ABI, which most modern Unix-like systems follow, passes the first six arguments in "in" registers %i0 through %i5, reserving %i6 for the frame pointer and %i7 for the return address.
Threaded code places all the responsibility for setting up and cleaning up a function call on the called code. The calling code does nothing but list the subroutines to be called. This puts all the function setup and cleanup code in one place -- the prolog and epilog of the function -- rather than in the many places that function is called. This makes threaded code the most compact calling convention.
Threaded code passes all arguments on the stack. All return values are returned on the stack. This makes naive implementations slower than calling conventions that keep more values in registers.
However, threaded code implementations that cache several of the top stack values in registers -- in particular, the return address -- are usually faster than subroutine calling conventions that always push and pop the return address to the stack.
The standard ARM calling convention allocates the 16 ARM registers as:
- r15, as always, is the program counter.
- r14 is the link register. (The BL instruction, used in a subroutine call, stores the return address in this register).
- r13 is arbitrarily chosen as the stack register.
- r12 ... ?
- r4 to r11: used to hold local variables.
- r0 to r3: used to hold argument values passed to a subroutine ... and also hold results returned from a subroutine.
If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0.
Subroutines must preserve the contents of r4 to r11 and the stack pointer. (Perhaps by saving them to the stack in the function prolog, then using them as scratch space, then restoring them from the stack in the function epilog).
In particular, subroutines that call other subroutines *must* save the return value in the link register r14 to the stack before calling those other subroutines.
However, such subroutines do not need to return that value to r14 -- they merely need to load that value into r15, the progam counter, to return.
The ARM stack is full-descending.
This calling convention causes a "typical" ARM subroutine to
- In the prolog, push r4 to r11 to the stack, and push the return address in r14, to the stack. (This can be done with a single STM instruction).
- copy any passed arguments (in r0 to r3) to the local scratch registers (r4 to r11).
- allocate other local variables to the remaining local scratch registers (r4 to r11).
- do calculations and call other subroutines as necessary using BL, assuming r0 to r3 and r14 will not be preserved.
- put the result in r0
- In the epilog, pull r4 to r11 from the stack, and pulls the return address to the program counter r15. (This can be done with a single LDM instruction).