-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: optimize frame layout for tail-call-only functions #11608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization: - Uses single register operations (str/ldr fp) instead of pair operations (stp/ldp fp,lr) - Applies when no other frame requirements exist (no frame pointers, stack args, etc.) - Is instruction-based: functions containing only return_call instructions get optimized - Maintains ABI compatibility and includes comprehensive test coverage
@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement. |
Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP. Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space. |
Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding? |
Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available. |
Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization. In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's |
Then could it be safe to have something like this? // Compute linkage frame size.
let setup_area_size = if flags.preserve_frame_pointers()
// The function arguments that are passed on the stack are addressed
// relative to the Frame Pointer.
|| flags.unwind_info()
|| incoming_args_size > 0
|| clobber_size > 0
|| fixed_frame_storage_size > 0
{
16 // FP, LR
} else {
match function_calls {
FunctionCalls::Regular => 16,
FunctionCalls::None => 0,
- FunctionCalls::TailOnly => 8,
+ FunctionCalls::TailOnly => 0,
}
}; |
I think you'll want to check the tail args and outgoing args size as well (the other parameters to |
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization: