-
Notifications
You must be signed in to change notification settings - Fork 4
AOT ICs: rebuild entire corpus. #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Will we need to remove and rebuild the corpus each time we update sm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can name the IC files based on their hash of their contents to avoid larger diffs and the need to have a deduplicate python script
In the course of the 124..127 upgrade, some IC bodies became invalid because of altered signatures of CacheIR opcodes. Nominally this should be fine in a system that never tries to execute such an invalid blob of bytecode. However, when we pre-weval all bodies in a weval build, we run into invalid memory reads in the weval phase just as we would if we tried to execute the IC at runtime in the IC interpreter: fundamentally, the corpus cannot have invalid bytecode in it. This PR thus removes the entire corpus and regenerates it: - Remove `js/src/ics/IC-*` - Build an engine with `--enable-aot-ics --enable-aot-ics-force --enable-aot-ics-enforce` - Run jit-tests (`./mach jit-test`) and jstests (`./mach jstests`) with `AOT_ICS_KEEP_GOING=1` - Use `js/src/ics/remove-duplicates.py` to remove duplicates among all the `IC-*` files in the gecko-dev root (jit-tests) and `js/src/tests` (jstests) - Put all of these files into `js/src/ics` Note for rebasing in the future: this should be squashed into the main AOT ICs commit.
c1cb9d6
to
4cd0641
Compare
Up until having the above realization about invalid bytecode, my answer was "I don't think so"; but now I'm having some Thoughts about the upgrade process. Unfortunately it's a fact of the checked-in string-of-macros format that it changes when arguments change; the only real way to ensure up-to-dateness would be to have some sort of typechecker/validator over all bodies in the corpus. This is sort of the second manifestation of problems from the add-or-remove-args-to-CacheIR-ops danger that I saw earlier in the rebase too (the first being that the interpreter itself gets out of sync in its bytecode parsing). So I think the answer is: I need to come up with more type safety here, and the hopeful answer is eventually no, but for now we need to be pretty careful. |
Separately, I'm just now publishing a version of weval that is resilient to failures during partial eval -- if we try to partial eval bad bytecode, we should just skip that function specialization, not error out entirely. That'll make us resilient to "useless corpus" at least. |
This pulls in bytecodealliance/gecko-dev#52 and bytecodealliance/gecko-dev#53, fixing some issues with AOT ICs discovered after the recent rebase.
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
In the course of the 124..127 upgrade, some IC bodies became invalid because of altered signatures of CacheIR opcodes. Nominally this should be fine in a system that never tries to execute such an invalid blob of bytecode. However, when we pre-weval all bodies in a weval build, we run into invalid memory reads in the weval phase just as we would if we tried to execute the IC at runtime in the IC interpreter: fundamentally, the corpus cannot have invalid bytecode in it. This PR thus removes the entire corpus and regenerates it:
js/src/ics/IC-*
--enable-aot-ics --enable-aot-ics-force --enable-aot-ics-enforce
./mach jit-test
) and jstests (./mach jstests
) withAOT_ICS_KEEP_GOING=1
js/src/ics/remove-duplicates.py
to remove duplicates among all theIC-*
files in the gecko-dev root (jit-tests) andjs/src/tests
(jstests)js/src/ics
Note for rebasing in the future: this should be squashed into the main AOT ICs commit.