Skip to content

Conversation

@ssh4net
Copy link
Contributor

@ssh4net ssh4net commented Oct 29, 2024

Description

Replacing 3x IBA + Helper function that generate 4 fulls size image buffers with single unsharp_mask_impl() that use parallel_image() to compute unsharp:
src + contr * (((src - blur) < threshold) ? 0.0 : (src - blur))

Added two pass 1D convolution for a kernels higher than 3x3

Tests

	ImageBuf sharped(input.spec());
	const int repeats = 50;

	std::cout << "Start sharpening\n";
	auto start = std::chrono::high_resolution_clock::now();

	for (int i = 0; i < repeats; i++)
	{
		//ok = ImageBufAlgo::unsharp_mask(sharped, input, "gaussian", 15.0f, 10.0f, 0.01f);
		ok = ImageBufAlgo::unsharp_mask(sharped, input, "gaussian", 5.0f, 2.0f, 0.05f);
		std::cout << ".";
	}

	std::cout << "\n";

	auto part1 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> elapsed_part1 = part1 - start;
	std::cout << "Elapsed time: " << elapsed_part1.count() << " s\n";

both single threaded (one IB at time) and multithreaded (multiply IB at time) show pretty good speedup:
~30-40% with less memory use.

for 5x5 gaussian kernels two pass mode should add at least 20% speedup.

(if someone can do independent benchmark, will be great. As soon as I had a big differences on them depend on real or synthetic use)

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable. (Check if there is no
    need to update the documentation, for example if this is a bug fix that
    doesn't change the API.)
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

instead of using 3x IBA + helper functions unsharp_mask_impl() function that use parallel_image()

Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
for a kernels 5x5 or more can give a speedup from 40% and higher for a bigger kernels

Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
@ssh4net ssh4net changed the title IBA::Unsharp_mask() speed and memory optimization (after convolution code) IBA::Unsharp_mask() speed and memory optimization Oct 29, 2024
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
@ssh4net
Copy link
Contributor Author

ssh4net commented Oct 31, 2024