-
Notifications
You must be signed in to change notification settings - Fork 649
IBA::Unsharp_mask() speed and memory optimization #4513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
instead of using 3x IBA + helper functions unsharp_mask_impl() function that use parallel_image() Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
for a kernels 5x5 or more can give a speedup from 40% and higher for a bigger kernels Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Contributor
Author
Description
Replacing 3x IBA + Helper function that generate 4 fulls size image buffers with single unsharp_mask_impl() that use parallel_image() to compute unsharp:
src + contr * (((src - blur) < threshold) ? 0.0 : (src - blur))
Added two pass 1D convolution for a kernels higher than 3x3
Tests
both single threaded (one IB at time) and multithreaded (multiply IB at time) show pretty good speedup:
~30-40% with less memory use.
for 5x5 gaussian kernels two pass mode should add at least 20% speedup.
(if someone can do independent benchmark, will be great. As soon as I had a big differences on them depend on real or synthetic use)
Checklist:
need to update the documentation, for example if this is a bug fix that
doesn't change the API.)
(adding new test cases if necessary).
corresponding Python bindings (and if altering ImageBufAlgo functions, also
exposed the new functionality as oiiotool options).
already run clang-format before submitting, I definitely will look at the CI
test that runs clang-format and fix anything that it highlights as being
nonconforming.