-
Notifications
You must be signed in to change notification settings - Fork 280
Refactor BlockSize to not have an invalid variant
#2866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d6b2ca5 to
0da85a7
Compare
|
Is it possible to have a benchmark to see what impact (if any) this has on overall performance? |
|
@shssoichiro Yes it's possible, but how should I do it for something like this? Would hawktracer be suitable, or AWCY? |
|
If I'm going to get a picture of whole binary performance, I'll usually use hyperfine. AWCY isn't the best at giving a consistent encoding speed across runs, and hawktracer is great for profiling, or if you're targeting a specific part of the codebase. But given that this touches a few different areas, I'd make a release binary from master, and a release binary with these changes, and benchmark them against each other in hyperfine, probably with a couple of different videos. Personally I think this change does make the code more idiomatic for Rust, so I'm leaning toward merging even if the performance improvement is negligible. But I'd like to see if there is a measurable improvement as well. |
https://github.com/lu-zero/speed-levels-rs gives you a mean to automate some of the |
tdaede
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! BLOCK_INVALID is an old leftover from aomenc / the AV1 spec that made it easier to implement some of the context stuff but was pretty inelegant (and still is, in aomenc).
Is there any place we would actually want to handle the error for subsampled_size rather than just .unwrap()ing? Possibly we could just assert and not return a Result.
70e969a to
b0c566f
Compare
|
@shssoichiro There seems to be a small but measurable performance improvement with this PR. Tested on 3 different videos, this PR was always slightly faster.
out7.y4m: 5 seconds, 256x256 resolution, yuv420p out3.y4m: 5 seconds, 240x240 resolution, yuv420p outx.y4m: 5 seconds, 1920x1080 resolution, yuv420p |
shssoichiro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! 🚀
|
@tdaede Did you have any other concerns before merging? |
I guess we sort of already "handle" the error with the |
|
I have no preference, the current way is fine. Just thought possibly not using Result might provide slightly better codegen if we didn't see a perf improvement as is. |
This should improve the codegen of
width_log2,height_log2,tx_size, andis_rect_tx_allowedby removing checks of the invalid variant, and it also forces callers ofsubsampled_sizeandsubsizeto check that the return value was not an invalid block size. If there is any performance-critical code where it is not desirable to check for the invalid block size, then it is still possible to useunsafeto remove the check.