Refactor `BlockSize` to not have an invalid variant #2866

redzic · 2021-12-21T05:09:50Z

This should improve the codegen of width_log2, height_log2, tx_size, and is_rect_tx_allowed by removing checks of the invalid variant, and it also forces callers of subsampled_size and subsize to check that the return value was not an invalid block size. If there is any performance-critical code where it is not desirable to check for the invalid block size, then it is still possible to use unsafe to remove the check.

shssoichiro · 2021-12-21T15:26:58Z

Is it possible to have a benchmark to see what impact (if any) this has on overall performance?

redzic · 2021-12-22T02:21:42Z

@shssoichiro Yes it's possible, but how should I do it for something like this? Would hawktracer be suitable, or AWCY?

shssoichiro · 2021-12-22T05:12:45Z

If I'm going to get a picture of whole binary performance, I'll usually use hyperfine. AWCY isn't the best at giving a consistent encoding speed across runs, and hawktracer is great for profiling, or if you're targeting a specific part of the codebase. But given that this touches a few different areas, I'd make a release binary from master, and a release binary with these changes, and benchmark them against each other in hyperfine, probably with a couple of different videos.

Personally I think this change does make the code more idiomatic for Rust, so I'm leaning toward merging even if the performance improvement is negligible. But I'd like to see if there is a measurable improvement as well.

lu-zero · 2021-12-22T11:59:33Z

@shssoichiro Yes it's possible, but how should I do it for something like this? Would hawktracer be suitable, or AWCY?

https://github.com/lu-zero/speed-levels-rs gives you a mean to automate some of the hyperfine workflow.

tdaede

Excellent work! BLOCK_INVALID is an old leftover from aomenc / the AV1 spec that made it easier to implement some of the context stuff but was pretty inelegant (and still is, in aomenc).

Is there any place we would actually want to handle the error for subsampled_size rather than just .unwrap()ing? Possibly we could just assert and not return a Result.

redzic · 2021-12-22T21:39:34Z

@shssoichiro There seems to be a small but measurable performance improvement with this PR.

Tested on 3 different videos, this PR was always slightly faster.

rav1e_master is the current master, and rav1e_bsize is this (now rebased onto current master) PR.

out7.y4m: 5 seconds, 256x256 resolution, yuv420p

out3.y4m: 5 seconds, 240x240 resolution, yuv420p

outx.y4m: 5 seconds, 1920x1080 resolution, yuv420p

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_master ~/out7.y4m -o out.ivf -y"
Benchmark 1: ./rav1e_master ~/out7.y4m -o out.ivf -y
  Time (mean ± σ):      8.168 s ±  0.057 s    [User: 30.287 s, System: 5.900 s]
  Range (min … max):    8.084 s …  8.298 s    10 runs

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_bsize ~/out7.y4m -o out.ivf -y"
Benchmark 1: ./rav1e_bsize ~/out7.y4m -o out.ivf -y
  Time (mean ± σ):      8.134 s ±  0.043 s    [User: 30.155 s, System: 6.092 s]
  Range (min … max):    8.079 s …  8.194 s    10 runs

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_bsize ~/out3.y4m -o out.ivf -y"
Benchmark 1: ./rav1e_bsize ~/out3.y4m -o out.ivf -y
  Time (mean ± σ):     21.774 s ±  0.049 s    [User: 51.109 s, System: 7.758 s]
  Range (min … max):   21.692 s … 21.850 s    10 runs

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_master ~/out3.y4m -o out.ivf -y"
Benchmark 1: ./rav1e_master ~/out3.y4m -o out.ivf -y
  Time (mean ± σ):     21.973 s ±  0.061 s    [User: 51.372 s, System: 7.676 s]
  Range (min … max):   21.850 s … 22.064 s    10 runs

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_master ~/outx.y4m -s 10 -o out.ivf -y"
Benchmark 1: ./rav1e_master ~/outx.y4m -s 10 -o out.ivf -y
  Time (mean ± σ):     79.930 s ±  0.410 s    [User: 116.721 s, System: 7.363 s]
  Range (min … max):   79.299 s … 80.539 s    10 runs

redzic@threadripper ~/P/rav1e (master)> hyperfine "./rav1e_bsize ~/outx.y4m -s 10 -o out.ivf -y"
Benchmark 1: ./rav1e_bsize ~/outx.y4m -s 10 -o out.ivf -y
  Time (mean ± σ):     79.733 s ±  0.361 s    [User: 116.573 s, System: 7.520 s]
  Range (min … max):   78.820 s … 80.015 s    10 runs

shssoichiro

Excellent! 🚀

shssoichiro · 2021-12-23T19:08:49Z

@tdaede Did you have any other concerns before merging?

redzic · 2021-12-23T19:30:42Z

Is there any place we would actually want to handle the error for subsampled_size rather than just .unwrap()ing? Possibly we could just assert and not return a Result.

I guess we sort of already "handle" the error with the .expect("invalid block size for this subsampling mode"), but I can shift the panic to the functions themselves rather than the caller if we prefer that

tdaede · 2021-12-24T04:17:26Z

I have no preference, the current way is fine. Just thought possibly not using Result might provide slightly better codegen if we didn't see a perf improvement as is.

redzic force-pushed the bsize-refactor branch from d6b2ca5 to 0da85a7 Compare December 21, 2021 05:16

tdaede reviewed Dec 22, 2021

View reviewed changes

redzic added 2 commits December 22, 2021 14:52

Refactor BlockSize to not have an invalid variant

76cfbea

Fix clippy warning

b0c566f

redzic force-pushed the bsize-refactor branch from 70e969a to b0c566f Compare December 22, 2021 20:53

shssoichiro approved these changes Dec 23, 2021

View reviewed changes

tdaede approved these changes Dec 24, 2021

View reviewed changes

shssoichiro merged commit 5a5711e into xiph:master Dec 24, 2021

redzic deleted the bsize-refactor branch December 25, 2021 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor `BlockSize` to not have an invalid variant #2866

Refactor `BlockSize` to not have an invalid variant #2866

Uh oh!

redzic commented Dec 21, 2021

Uh oh!

shssoichiro commented Dec 21, 2021

Uh oh!

redzic commented Dec 22, 2021

Uh oh!

shssoichiro commented Dec 22, 2021 •

edited

Loading

Uh oh!

lu-zero commented Dec 22, 2021

Uh oh!

tdaede left a comment

Uh oh!

redzic commented Dec 22, 2021

Uh oh!

shssoichiro left a comment

Uh oh!

shssoichiro commented Dec 23, 2021

Uh oh!

redzic commented Dec 23, 2021

Uh oh!

tdaede commented Dec 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor BlockSize to not have an invalid variant #2866

Refactor BlockSize to not have an invalid variant #2866

Uh oh!

Conversation

redzic commented Dec 21, 2021

Uh oh!

shssoichiro commented Dec 21, 2021

Uh oh!

redzic commented Dec 22, 2021

Uh oh!

shssoichiro commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lu-zero commented Dec 22, 2021

Uh oh!

tdaede left a comment

Choose a reason for hiding this comment

Uh oh!

redzic commented Dec 22, 2021

Uh oh!

shssoichiro left a comment

Choose a reason for hiding this comment

Uh oh!

shssoichiro commented Dec 23, 2021

Uh oh!

redzic commented Dec 23, 2021

Uh oh!

tdaede commented Dec 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor `BlockSize` to not have an invalid variant #2866

Refactor `BlockSize` to not have an invalid variant #2866

shssoichiro commented Dec 22, 2021 •

edited

Loading