Skip to content

Conversation

pnodet
Copy link
Contributor

@pnodet pnodet commented Sep 4, 2025

Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization:

Reduce frame size from 16 to 8 bytes for functions that only make tail
calls (FunctionCalls::TailOnly). This optimization:

- Uses single register operations (str/ldr fp) instead of pair
operations (stp/ldp fp,lr)
- Applies when no other frame requirements exist (no frame pointers,
stack args, etc.)
- Is instruction-based: functions containing only return_call
instructions get optimized
- Maintains ABI compatibility and includes comprehensive test coverage
@pnodet
Copy link
Contributor Author

pnodet commented Sep 4, 2025

@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement.

@cfallin
Copy link
Member

cfallin commented Sep 4, 2025

Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP.

Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space.

@pnodet
Copy link
Contributor Author

pnodet commented Sep 4, 2025

Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding?

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 4, 2025

Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available.

@cfallin
Copy link
Member

cfallin commented Sep 4, 2025

Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization.

In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's cg_clif.

@pnodet
Copy link
Contributor Author

pnodet commented Sep 4, 2025

Then could it be safe to have something like this?

        // Compute linkage frame size.
        let setup_area_size = if flags.preserve_frame_pointers()
            // The function arguments that are passed on the stack are addressed
            // relative to the Frame Pointer.
            || flags.unwind_info()
            || incoming_args_size > 0
            || clobber_size > 0
            || fixed_frame_storage_size > 0
        {
            16 // FP, LR
        } else {
            match function_calls {
                FunctionCalls::Regular => 16,
                FunctionCalls::None => 0,
-               FunctionCalls::TailOnly => 8,
+               FunctionCalls::TailOnly => 0,
            }
        };

@cfallin
Copy link
Member

cfallin commented Sep 4, 2025

I think you'll want to check the tail args and outgoing args size as well (the other parameters to compute_frame_layout) -- basically, if any part of the frame needs to exist, then we need to do the FP setup even if we only have tail calls.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants