Lookup compiled functions by `FuncKey` at runtime #11630

fitzgen · 2025-09-05T20:50:09Z

This treats compiled functions homogeneously, removing the need to add new metadata tables to places like CompiledModuleInfo whenever we add a new kind of function, and simplifying the process of constructing the metadata for a final, linked compilation artifact. This also paves the way to doing gc-sections in our linking (getting smaller code sizes, removing functions that have been inlined into every caller, and etc...) as we no longer assume that certain types of function index spaces are dense.

This does, however, replace a couple operations that were previously O(1) table lookups with O(log n) binary searches. And, notably, some of these are on the VMFuncRef-creation path, and therefore on the
force-initialization-of-a-lazy-funcref-table-slot path, when we look up a Wasm function and its trampolines. Our call-indirect micro-benchmarks show that indirect calling every funcref once in a table of 64Ki slots went from taking ~2.6ms to ~3.8ms (a +46% slowdown). Note that this edge case is both synthetic and the worst-case scenario for this commit's change: we are measuring, as much as we can, only the force-initialization-of-a-lazy-funcref-table-slot path. All other call-indirect benchmarks are within the noise, which is what we would expect.

Also, the size of .cwasms is slightly larger: spidermonkey.wasm's .cwasm size went from 19_750_632 bytes to 19_785_872 bytes, which is a 1.78% increase.

Ultimately, I believe that the simplification and possibility of doing gc-sections in the future is worth the downsides. That said, if others feel differently, there are some things we could try to improve the situation, although most things I can think of off the top of my head (e.g. LEB128s and delta encoding, making certain FuncKey kinds' index spaces dense) will improve one of code size or lookup times while pessimizing the other. I'm sure we could come up with something given enough effort though.

call-indirect micro-benchmarks results

call-indirect/same-callee/table-init-lazy/65536-calls
                        time:   [144.14 µs 145.26 µs 146.56 µs]
                        thrpt:  [447.15 Melem/s 451.15 Melem/s 454.68 Melem/s]
                 change:
                        time:   [−5.5066% −3.6611% −1.9130%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9503% +3.8002% +5.8275%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
call-indirect/different-callees/table-init-lazy/65536-calls
                        time:   [3.8128 ms 3.8433 ms 3.8763 ms]
                        thrpt:  [16.907 Melem/s 17.052 Melem/s 17.188 Melem/s]
                 change:
                        time:   [+43.064% +46.066% +49.080%] (p = 0.00 < 0.05)
                        thrpt:  [−32.922% −31.538% −30.101%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
call-indirect/same-callee/table-init-strict/65536-calls
                        time:   [130.27 µs 131.66 µs 133.40 µs]
                        thrpt:  [491.26 Melem/s 497.75 Melem/s 503.09 Melem/s]
                 change:
                        time:   [−6.4798% −4.1871% −1.8965%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9332% +4.3701% +6.9288%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
call-indirect/different-callees/table-init-strict/65536-calls
                        time:   [176.22 µs 178.49 µs 180.99 µs]
                        thrpt:  [362.10 Melem/s 367.18 Melem/s 371.90 Melem/s]
                 change:
                        time:   [−18.431% −15.397% −12.330%] (p = 0.00 < 0.05)
                        thrpt:  [+14.064% +18.200% +22.595%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

This treats compiled functions homogeneously, removing the need to add new metadata tables to places like `CompiledModuleInfo` whenever we add a new kind of function, and simplifying the process of constructing the metadata for a final, linked compilation artifact. This also paves the way to doing gc-sections in our linking (getting smaller code sizes, removing functions that have been inlined into every caller, and etc...) as we no longer assume that certain types of function index spaces are dense. This does, however, replace a couple operations that were previously O(1) table lookups with O(log n) binary searches. And, notably, some of these are on the `VMFuncRef`-creation path, and therefore on the force-initialization-of-a-lazy-funcref-table-slot path, when we look up a Wasm function and its trampolines. Our call-indirect micro-benchmarks show that indirect calling every funcref once in a table of 64Ki slots went from taking ~2.6ms to ~3.8ms (a +46% slowdown). Note that this edge case is both synthetic and the worst-case scenario for this commit's change: we are measuring, as much as we can, *only* the force-initialization-of-a-lazy-funcref-table-slot path. All other call-indirect benchmarks are within the noise, which is what we would expect. Also, the size of `.cwasm`s is slightly larger: `spidermonkey.wasm`'s `.cwasm` size went from `19_750_632` bytes to `19785872` bytes, which is a 1.78% increase. Ultimately, I believe that the simplification and possibility of doing gc-sections in the future is worth the downsides. That said, if others feel differently, there are some things we could try to improve the situation, although most things I can think of off the top of my head (e.g. LEB128s and delta encoding, making certain `FuncKey` kinds' index spaces dense) will improve one of code size or lookup times while pessimizing the other. I'm sure we could come up with something given enough effort though. <details> <summary>call-indirect micro-benchmarks results</summary> ``` call-indirect/same-callee/table-init-lazy/65536-calls time: [144.14 µs 145.26 µs 146.56 µs] thrpt: [447.15 Melem/s 451.15 Melem/s 454.68 Melem/s] change: time: [−5.5066% −3.6611% −1.9130%] (p = 0.00 < 0.05) thrpt: [+1.9503% +3.8002% +5.8275%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe call-indirect/different-callees/table-init-lazy/65536-calls time: [3.8128 ms 3.8433 ms 3.8763 ms] thrpt: [16.907 Melem/s 17.052 Melem/s 17.188 Melem/s] change: time: [+43.064% +46.066% +49.080%] (p = 0.00 < 0.05) thrpt: [−32.922% −31.538% −30.101%] Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild call-indirect/same-callee/table-init-strict/65536-calls time: [130.27 µs 131.66 µs 133.40 µs] thrpt: [491.26 Melem/s 497.75 Melem/s 503.09 Melem/s] change: time: [−6.4798% −4.1871% −1.8965%] (p = 0.00 < 0.05) thrpt: [+1.9332% +4.3701% +6.9288%] Performance has improved. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe call-indirect/different-callees/table-init-strict/65536-calls time: [176.22 µs 178.49 µs 180.99 µs] thrpt: [362.10 Melem/s 367.18 Melem/s 371.90 Melem/s] change: time: [−18.431% −15.397% −12.330%] (p = 0.00 < 0.05) thrpt: [+14.064% +18.200% +22.595%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe ``` </details>

It is required at runtime not just compile time now.

github-actions · 2025-09-05T22:44:57Z

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "wasmtime:api", "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

alexcrichton

Really nice how this turned out, thanks for pushing on this!

One more benchmark though before merging I now realize -- component instantiation. IIRC we confirmed core wasm instantiation was largely unaffected by this but reading over this I'm remembering that component instantiation, when it hits various initializers for trampolines/builtins/etc, will do the FuncKey lookup now. Can you make a simple-ish spidermonkey.wasm component and compare before/after the instantiation numbers?

alexcrichton · 2025-09-08T20:01:33Z

crates/environ/src/module.rs

+    /// This module's index.
+    pub module_index: Option<StaticModuleIndex>,


Could this expand on what a None case means?

(also if it's purely to be able to use Module::default I think it'd be ok to remove that construction method)

Addressed in #11694

And add `FuncKeyKind` and `FuncKeyNamespace` types. Split out from bytecodealliance#11630

fitzgen requested review from a team as code owners September 5, 2025 20:50

fitzgen requested review from cfallin and alexcrichton and removed request for a team September 5, 2025 20:50

fitzgen added 2 commits September 5, 2025 13:59

Move FuncKey's definition

fe9c629

It is required at runtime not just compile time now.

fix identity cast

7ae0357

github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself winch Winch issues or pull requests labels Sep 5, 2025

alexcrichton approved these changes Sep 8, 2025

View reviewed changes

fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Sep 12, 2025

Remove AllCallFunc

9780efd

And add `FuncKeyKind` and `FuncKeyNamespace` types. Split out from bytecodealliance#11630

fitzgen mentioned this pull request Sep 12, 2025

Remove AllCallFunc #11694

Open

fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Sep 12, 2025

Remove AllCallFunc

9cfd3ad

And add `FuncKeyKind` and `FuncKeyNamespace` types. Split out from bytecodealliance#11630

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lookup compiled functions by `FuncKey` at runtime #11630

Lookup compiled functions by `FuncKey` at runtime #11630

fitzgen commented Sep 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

alexcrichton left a comment

Uh oh!

alexcrichton Sep 8, 2025

Uh oh!

fitzgen Sep 12, 2025

Uh oh!

Uh oh!

		/// This module's index.
		pub module_index: Option<StaticModuleIndex>,

Lookup compiled functions by FuncKey at runtime #11630

Are you sure you want to change the base?

Lookup compiled functions by FuncKey at runtime #11630

Conversation

fitzgen commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 5, 2025

Subscribe to Label Action

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

fitzgen Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Lookup compiled functions by `FuncKey` at runtime #11630

Lookup compiled functions by `FuncKey` at runtime #11630

fitzgen commented Sep 5, 2025 •

edited

Loading