Skip to content

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jul 26, 2024

In the course of the 124..127 upgrade, some IC bodies became invalid because of altered signatures of CacheIR opcodes. Nominally this should be fine in a system that never tries to execute such an invalid blob of bytecode. However, when we pre-weval all bodies in a weval build, we run into invalid memory reads in the weval phase just as we would if we tried to execute the IC at runtime in the IC interpreter: fundamentally, the corpus cannot have invalid bytecode in it. This PR thus removes the entire corpus and regenerates it:

  • Remove js/src/ics/IC-*
  • Build an engine with --enable-aot-ics --enable-aot-ics-force --enable-aot-ics-enforce
  • Run jit-tests (./mach jit-test) and jstests (./mach jstests) with AOT_ICS_KEEP_GOING=1
  • Use js/src/ics/remove-duplicates.py to remove duplicates among all the IC-* files in the gecko-dev root (jit-tests) and js/src/tests (jstests)
  • Put all of these files into js/src/ics

Note for rebasing in the future: this should be squashed into the main AOT ICs commit.

@cfallin cfallin requested a review from JakeChampion July 26, 2024 23:08
@JakeChampion
Copy link
Collaborator

Will we need to remove and rebuild the corpus each time we update sm?

Copy link
Collaborator

@JakeChampion JakeChampion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can name the IC files based on their hash of their contents to avoid larger diffs and the need to have a deduplicate python script

In the course of the 124..127 upgrade, some IC bodies became invalid
because of altered signatures of CacheIR opcodes. Nominally this should
be fine in a system that never tries to execute such an invalid blob of
bytecode. However, when we pre-weval all bodies in a weval build, we run
into invalid memory reads in the weval phase just as we would if we
tried to execute the IC at runtime in the IC interpreter: fundamentally,
the corpus cannot have invalid bytecode in it. This PR thus removes the
entire corpus and regenerates it:

- Remove `js/src/ics/IC-*`
- Build an engine with `--enable-aot-ics --enable-aot-ics-force
  --enable-aot-ics-enforce`
- Run jit-tests (`./mach jit-test`) and jstests (`./mach jstests`) with
  `AOT_ICS_KEEP_GOING=1`
- Use `js/src/ics/remove-duplicates.py` to remove duplicates among all
  the `IC-*` files in the gecko-dev root (jit-tests) and `js/src/tests`
  (jstests)
- Put all of these files into `js/src/ics`

Note for rebasing in the future: this should be squashed into the main
AOT ICs commit.
@cfallin cfallin force-pushed the cfallin/regenerate-aot-ics branch from c1cb9d6 to 4cd0641 Compare July 26, 2024 23:18
@cfallin
Copy link
Member Author

cfallin commented Jul 26, 2024

Will we need to remove and rebuild the corpus each time we update sm?

Up until having the above realization about invalid bytecode, my answer was "I don't think so"; but now I'm having some Thoughts about the upgrade process. Unfortunately it's a fact of the checked-in string-of-macros format that it changes when arguments change; the only real way to ensure up-to-dateness would be to have some sort of typechecker/validator over all bodies in the corpus. This is sort of the second manifestation of problems from the add-or-remove-args-to-CacheIR-ops danger that I saw earlier in the rebase too (the first being that the interpreter itself gets out of sync in its bytecode parsing). So I think the answer is: I need to come up with more type safety here, and the hopeful answer is eventually no, but for now we need to be pretty careful.

@cfallin
Copy link
Member Author

cfallin commented Jul 26, 2024

Separately, I'm just now publishing a version of weval that is resilient to failures during partial eval -- if we try to partial eval bad bytecode, we should just skip that function specialization, not error out entirely. That'll make us resilient to "useless corpus" at least.

@cfallin cfallin merged commit b7935be into ff-127-0-2 Jul 26, 2024
@cfallin cfallin deleted the cfallin/regenerate-aot-ics branch July 26, 2024 23:41
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 26, 2024
This pulls in bytecodealliance/gecko-dev#52 and
bytecodealliance/gecko-dev#53, fixing some issues with AOT ICs
discovered after the recent rebase.
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants