Skip to content

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Sep 5, 2025

This treats compiled functions homogeneously, removing the need to add new metadata tables to places like CompiledModuleInfo whenever we add a new kind of function, and simplifying the process of constructing the metadata for a final, linked compilation artifact. This also paves the way to doing gc-sections in our linking (getting smaller code sizes, removing functions that have been inlined into every caller, and etc...) as we no longer assume that certain types of function index spaces are dense.

This does, however, replace a couple operations that were previously O(1) table lookups with O(log n) binary searches. And, notably, some of these are on the VMFuncRef-creation path, and therefore on the
force-initialization-of-a-lazy-funcref-table-slot path, when we look up a Wasm function and its trampolines. Our call-indirect micro-benchmarks show that indirect calling every funcref once in a table of 64Ki slots went from taking ~2.6ms to ~3.8ms (a +46% slowdown). Note that this edge case is both synthetic and the worst-case scenario for this commit's change: we are measuring, as much as we can, only the force-initialization-of-a-lazy-funcref-table-slot path. All other call-indirect benchmarks are within the noise, which is what we would expect.

Also, the size of .cwasms is slightly larger: spidermonkey.wasm's .cwasm size went from 19_750_632 bytes to 19_785_872 bytes, which is a 1.78% increase.

Ultimately, I believe that the simplification and possibility of doing gc-sections in the future is worth the downsides. That said, if others feel differently, there are some things we could try to improve the situation, although most things I can think of off the top of my head (e.g. LEB128s and delta encoding, making certain FuncKey kinds' index spaces dense) will improve one of code size or lookup times while pessimizing the other. I'm sure we could come up with something given enough effort though.

call-indirect micro-benchmarks results
call-indirect/same-callee/table-init-lazy/65536-calls
                        time:   [144.14 µs 145.26 µs 146.56 µs]
                        thrpt:  [447.15 Melem/s 451.15 Melem/s 454.68 Melem/s]
                 change:
                        time:   [−5.5066% −3.6611% −1.9130%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9503% +3.8002% +5.8275%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
call-indirect/different-callees/table-init-lazy/65536-calls
                        time:   [3.8128 ms 3.8433 ms 3.8763 ms]
                        thrpt:  [16.907 Melem/s 17.052 Melem/s 17.188 Melem/s]
                 change:
                        time:   [+43.064% +46.066% +49.080%] (p = 0.00 < 0.05)
                        thrpt:  [−32.922% −31.538% −30.101%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
call-indirect/same-callee/table-init-strict/65536-calls
                        time:   [130.27 µs 131.66 µs 133.40 µs]
                        thrpt:  [491.26 Melem/s 497.75 Melem/s 503.09 Melem/s]
                 change:
                        time:   [−6.4798% −4.1871% −1.8965%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9332% +4.3701% +6.9288%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
call-indirect/different-callees/table-init-strict/65536-calls
                        time:   [176.22 µs 178.49 µs 180.99 µs]
                        thrpt:  [362.10 Melem/s 367.18 Melem/s 371.90 Melem/s]
                 change:
                        time:   [−18.431% −15.397% −12.330%] (p = 0.00 < 0.05)
                        thrpt:  [+14.064% +18.200% +22.595%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

This treats compiled functions homogeneously, removing the need to add new
metadata tables to places like `CompiledModuleInfo` whenever we add a new kind
of function, and simplifying the process of constructing the metadata for a
final, linked compilation artifact. This also paves the way to doing gc-sections
in our linking (getting smaller code sizes, removing functions that have been
inlined into every caller, and etc...) as we no longer assume that certain types
of function index spaces are dense.

This does, however, replace a couple operations that were previously O(1) table
lookups with O(log n) binary searches. And, notably, some of these are on the
`VMFuncRef`-creation path, and therefore on the
force-initialization-of-a-lazy-funcref-table-slot path, when we look up a Wasm
function and its trampolines. Our call-indirect micro-benchmarks show that
indirect calling every funcref once in a table of 64Ki slots went from taking
~2.6ms to ~3.8ms (a +46% slowdown). Note that this edge case is both synthetic
and the worst-case scenario for this commit's change: we are measuring, as much
as we can, *only* the force-initialization-of-a-lazy-funcref-table-slot
path. All other call-indirect benchmarks are within the noise, which is what we
would expect.

Also, the size of `.cwasm`s is slightly larger: `spidermonkey.wasm`'s `.cwasm`
size went from `19_750_632` bytes to `19785872` bytes, which is a 1.78%
increase.

Ultimately, I believe that the simplification and possibility of doing
gc-sections in the future is worth the downsides. That said, if others feel
differently, there are some things we could try to improve the situation,
although most things I can think of off the top of my head (e.g. LEB128s and
delta encoding, making certain `FuncKey` kinds' index spaces dense) will improve
one of code size or lookup times while pessimizing the other. I'm sure we could
come up with something given enough effort though.

<details>

<summary>call-indirect micro-benchmarks results</summary>

```
call-indirect/same-callee/table-init-lazy/65536-calls
                        time:   [144.14 µs 145.26 µs 146.56 µs]
                        thrpt:  [447.15 Melem/s 451.15 Melem/s 454.68 Melem/s]
                 change:
                        time:   [−5.5066% −3.6611% −1.9130%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9503% +3.8002% +5.8275%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
call-indirect/different-callees/table-init-lazy/65536-calls
                        time:   [3.8128 ms 3.8433 ms 3.8763 ms]
                        thrpt:  [16.907 Melem/s 17.052 Melem/s 17.188 Melem/s]
                 change:
                        time:   [+43.064% +46.066% +49.080%] (p = 0.00 < 0.05)
                        thrpt:  [−32.922% −31.538% −30.101%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
call-indirect/same-callee/table-init-strict/65536-calls
                        time:   [130.27 µs 131.66 µs 133.40 µs]
                        thrpt:  [491.26 Melem/s 497.75 Melem/s 503.09 Melem/s]
                 change:
                        time:   [−6.4798% −4.1871% −1.8965%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9332% +4.3701% +6.9288%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
call-indirect/different-callees/table-init-strict/65536-calls
                        time:   [176.22 µs 178.49 µs 180.99 µs]
                        thrpt:  [362.10 Melem/s 367.18 Melem/s 371.90 Melem/s]
                 change:
                        time:   [−18.431% −15.397% −12.330%] (p = 0.00 < 0.05)
                        thrpt:  [+14.064% +18.200% +22.595%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
```

</details>
@fitzgen fitzgen requested review from a team as code owners September 5, 2025 20:50
@fitzgen fitzgen requested review from cfallin and alexcrichton and removed request for a team September 5, 2025 20:50
It is required at runtime not just compile time now.
@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself winch Winch issues or pull requests labels Sep 5, 2025
Copy link

github-actions bot commented Sep 5, 2025

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "wasmtime:api", "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice how this turned out, thanks for pushing on this!

One more benchmark though before merging I now realize -- component instantiation. IIRC we confirmed core wasm instantiation was largely unaffected by this but reading over this I'm remembering that component instantiation, when it hits various initializers for trampolines/builtins/etc, will do the FuncKey lookup now. Can you make a simple-ish spidermonkey.wasm component and compare before/after the instantiation numbers?

Comment on lines +294 to +295
/// This module's index.
pub module_index: Option<StaticModuleIndex>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this expand on what a None case means?

(also if it's purely to be able to use Module::default I think it'd be ok to remove that construction method)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #11694

fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Sep 12, 2025
And add `FuncKeyKind` and `FuncKeyNamespace` types.

Split out from bytecodealliance#11630
@fitzgen fitzgen mentioned this pull request Sep 12, 2025
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Sep 12, 2025
And add `FuncKeyKind` and `FuncKeyNamespace` types.

Split out from bytecodealliance#11630
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself winch Winch issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants