For normal, non-recursive function calls, this is usually a micro-optimization that saves little time and space, since there are not that many different functions available to call. When dealing with recursive or mutually recursive functions, however, the stack space and the number of returns saved can grow to huge numbers, since a function can call itself, directly or indirectly, a huge number of times. In fact, it often asymptotically reduces stack space requirements from linear, or O(n), to constant, or O(1).
If several functions are mutually recursive, meaning they each call one another, and each call they make to one another in an execution sequence uses a tail call, then tail call optimization will give a properly tail recursive implementation that does not consume stack space. Proper tail recursion optimization is required by the standard definitions of some programming languages, such as Scheme.
The notion of tail position in Scheme can be defined as follows:
(define (fac-times n acc)
(if (= n 0)
acc
(fac-times (- n 1) (* acc n))))
(if (< n 0)
(display "Wrong argument!")
(fac-times n 1)))
As you can see, the inner procedure fac-times calls itself last in the control flow. This allows an interpreter or compiler to reorganize the execution which would ordinarily look like this:
call factorial (3)
call fac-times (3 1)
call fac-times (2 3)
call fac-times (1 6)
call fac-times (0 6)
return 6
return 6
return 6
return 6
return 6
into the more space- (and time-) efficient variant:
call factorial (3)
replace arguments with (3 1), jump to "fac-times"
replace arguments with (2 3), jump to "fac-times"
replace arguments with (1 6), jump to "fac-times"
replace arguments with (0 6), jump to "fac-times"
return 6
This reorganization saves space because no state except for the calling function's address needs to be saved, either on the stack or on the heap. This also means that the programmer need not worry about running out of stack or heap space for extremely deep recursions.
Some programmers working in functional languages will rewrite recursive code to be tail-recursive so they can take advantage of this feature. This often requires addition of an "accumulator" argument (acc in the above example) to the function. In some cases (such as filtering lists) and in some languages, full tail recursion may require a function that was previously purely functional to be written such that it mutates references stored in other variables.
Besides space and execution efficiency, tail recursion optimization is important in the functional programming idiom known as continuation passing style (CPS), which would otherwise quickly run out of stack space.
For example, consider a function that duplicates a linked list, described here in C:
if (input == NULL) {
return NULL;
} else {
list *head = malloc(sizeof *head);
head->value = input->value;
head->next = duplicate(input->next);
return head;
}
}
In this form the function is not tail-recursive, because control returns to the caller after the recursive call to set the value of head->next. But on resumption, the caller merely prepends a value to the result from the callee. So the function is tail-recursive, save for a "cons" action, that is, tail recursive modulo cons. Warren's method gives the following purely tail-recursive implementation:
list *head;
duplicate_prime(input, &head);
return head;
}
void duplicate_prime(const list *input, list **p)
{
if (input == NULL) {
*p = NULL;
} else {
*p = malloc(sizeof **p);
(*p)->value = input->value;
duplicate_prime(input->next, &(*p)->next);
}
}
Note how the callee now appends to the end of the list, rather than have the caller prepend to the beginning.
The properly tail-recursive implementation can be converted to iterative form:
list *head;
list **p = &head;
while (input != NULL) {
*p = malloc(sizeof **p);
(*p)->value = input->value;
input = input->next;
p = &(*p)->next;
}
*p = NULL;
return head;
}
goto &NAME;Since many Scheme compilers use C as an intermediate target code, the problem comes down to coding tail recursion in C without growing the stack. Many implementations achieve this by using a device known as a trampoline, a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to call another, instead of calling it directly it returns the address of the function to be called, the arguments to be used, and so on, to the trampoline. This ensures that the C stack does not grow and iteration can continue indefinitely.
As this article by Samuel Jack suggests, it is possible to implement trampolining using higher-order functions in languages that support them, such as C#.
Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, Chicken, uses a technique first described by Henry Baker from an unpublished suggestion by Andrew Appel, in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are garbage-collected using the Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building." The garbage collection ensures that mutual tail recursion can continue indefinitely.