enable channels last for reflection padding on CPU #102518

mingfeima · 2023-05-30T07:24:52Z

Stack from ghstack (oldest at bottom):

Add channels last support for reflection padding on CPU. The following test cases will pass with this patch:

python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32

The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.

single core inference

(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.356 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 86.821 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.328 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 16.806 ms

single socket inference

(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.142 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 7.367 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.027 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2023-05-30T07:24:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102518

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8227be8:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: e66928b Pull Request resolved: #102518

cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Add channels last support for reflection padding on CPU. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32 ``` The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 86.821 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.806 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.367 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms ``` cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

cpuhrsch · 2023-06-15T18:50:42Z

torch/testing/_internal/common_modules.py

@@ -3028,14 +3028,20 @@ def module_inputs_torch_nn_ConstantPad3d(module_info, device, dtype, requires_gr
               module_inputs_func=module_inputs_torch_nn_ReflectionPad2d,


Can you double check that these test inputs covers good channels last specific test cases?

I'm worried that they might work well for generic strided inputs, but didn't take into account also exercising the channels last format.

right now, all the memory format related tests has been re-structured into test_memory_format from test_modules.py, it will check:

output memory format (given nhwc input will get nhwc output)

mixed memory format (some OPs can have different memory format for input and weight)

correctness, compare the output of contiguous and channels last

So if I intentionally let the channels last kernel generate wrong output, it will fail the following test cases such as below:

(pytorch-mingfei) [mingfeim@mlt-skx091 test]$ python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 F ====================================================================== FAIL: test_memory_format_nn_ReflectionPad2d_cpu_float32 (__main__.TestModuleCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test result = test(self, **param_kwargs) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_modules.py", line 118, in test_wrapper return test(*args, **kwargs) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_cuda.py", line 162, in wrapped return f(*args, **kwargs) File "test_modules.py", line 695, in test_memory_format self.assertEqual(outputs, desired_outputs, rtol=rtol, atol=atol) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3100, in assertEqual raise error_metas[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 1008 / 1296 (77.8%) Greatest absolute difference: 16.82485580444336 at index (0, 0, 1, 1) (up to 1e-05 allowed) Greatest relative difference: 143.67701721191406 at index (0, 1, 0, 3) (up to 1e-05 allowed) ---------------------------------------------------------------------- Ran 1 test in 0.013s FAILED (failures=1) (pytorch-mingfei) [mingfeim@mlt-skx091 test]$ python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32 F ====================================================================== FAIL: test_memory_format_nn_ReflectionPad3d_cpu_float32 (__main__.TestModuleCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test result = test(self, **param_kwargs) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_modules.py", line 118, in test_wrapper return test(*args, **kwargs) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_cuda.py", line 162, in wrapped return f(*args, **kwargs) File "test_modules.py", line 695, in test_memory_format self.assertEqual(outputs, desired_outputs, rtol=rtol, atol=atol) File "/home/mingfeim/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3100, in assertEqual raise error_metas[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 1650 / 1944 (84.9%) Greatest absolute difference: 17.179780960083008 at index (0, 0, 5, 1, 3) (up to 1e-05 allowed) Greatest relative difference: 3617.848876953125 at index (1, 1, 3, 5, 0) (up to 1e-05 allowed) ---------------------------------------------------------------------- Ran 1 test in 0.012s FAILED (failures=1)

@cpuhrsch could you please help review this one , thanks !

Add channels last support for reflection padding on CPU. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32 ``` The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 86.821 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.806 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.367 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms ``` cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima · 2023-07-07T01:20:11Z

@cpuhrsch could you please help review this one again ? Thanks! The current test cases are good enough to cover the correctness of channels last memory format test.

cpuhrsch · 2023-07-10T18:15:21Z

aten/src/ATen/native/ReflectionPad.cpp

@@ -258,7 +258,12 @@ void reflection_pad2d_out_template(
  if (ndim == 3) {
    output.resize_({nplane, output_h, output_w});
  } else {


Why is it enough to only update this branch?

resize_() has an default argument of memory_format=at::MemoryFormat::Contiguous.

for 3-dim inputs, aka. NCW, no channels last for this yet (current rule is channels last only applies to 4-dim and 5-dim tensor at the moment). Therefore we don't have to explicitly pass a memory format argument for 3-dim inputs. So even inputs are given as NWC, it will be treated as a non-contiguous NCW and getting an NCW output.

for 4-dim inputs, aka. NCHW and 5-dim inputs aka. NCDHW, memory_format argument should be given as the same value of the input memory format, which is a rule of the memory format propagation: NCHW input gets a NCHW output; NHWC input gets an NHWC output.

cpuhrsch · 2023-07-11T17:19:33Z

@CaoE - Do you plan to review this as well?

aten/src/ATen/native/cpu/PaddingKernel.cpp

CaoE · 2023-07-13T01:47:30Z

LGTM.

cpuhrsch · 2023-07-13T16:20:23Z

@pytorchbot merge

pytorchmergebot · 2023-07-13T16:22:24Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

enable channels last for reflection padding on CPU

162b6ab

[ghstack-poisoned]

mingfeima mentioned this pull request May 30, 2023

update argument checks from padding layers #102253

Closed

This was referenced May 30, 2023

optimize reflection padding performance on CPU #102254

Closed

optimize replication padding performance on CPU #102255

Closed

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label May 30, 2023

mingfeima added a commit that referenced this pull request May 30, 2023

enable channels last for reflection padding on CPU

4ec09be

ghstack-source-id: e66928b Pull Request resolved: #102518

mingfeima marked this pull request as draft May 30, 2023 07:25

mingfeima added the topic: not user facing topic category label May 30, 2023

pytorchbot added the open source label May 30, 2023

Update on "enable channels last for reflection padding on CPU"

bba1658

cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima mentioned this pull request May 31, 2023

enable channels last for replication padding on CPU #102597

Closed

mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label May 31, 2023

mingfeima requested a review from cpuhrsch June 1, 2023 01:36

mingfeima marked this pull request as ready for review June 1, 2023 01:37

mingfeima mentioned this pull request Jun 5, 2023

add bfloat16 support for reflection and replication padding #102949

Closed

mingfeima added 3 commits June 6, 2023 09:57

cpuhrsch reviewed Jun 15, 2023

View reviewed changes

mingfeima requested a review from cpuhrsch July 3, 2023 02:02

cpuhrsch reviewed Jul 10, 2023

View reviewed changes

CaoE reviewed Jul 12, 2023

View reviewed changes

aten/src/ATen/native/cpu/PaddingKernel.cpp Show resolved Hide resolved

mingfeima requested review from cpuhrsch and CaoE July 13, 2023 01:14

CaoE approved these changes Jul 13, 2023

View reviewed changes

cpuhrsch approved these changes Jul 13, 2023

View reviewed changes

pytorchmergebot added the merging label Jul 13, 2023

pytorchmergebot added Merged and removed merging labels Jul 13, 2023

pytorchmergebot closed this in f73757d Jul 13, 2023

facebook-github-bot deleted the gh/mingfeima/121/head branch July 17, 2023 14:17

cpuhrsch added release notes: nn release notes category and removed topic: not user facing topic category labels Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable channels last for reflection padding on CPU #102518

enable channels last for reflection padding on CPU #102518

Uh oh!

mingfeima commented May 30, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 30, 2023 •

edited

Loading

Uh oh!

cpuhrsch Jun 15, 2023

Uh oh!

mingfeima Jul 3, 2023

Uh oh!

mingfeima Jul 5, 2023

Uh oh!

mingfeima commented Jul 7, 2023

Uh oh!

cpuhrsch Jul 10, 2023

Uh oh!

mingfeima Jul 11, 2023

Uh oh!

cpuhrsch commented Jul 11, 2023

Uh oh!

Uh oh!

CaoE commented Jul 13, 2023

Uh oh!

cpuhrsch commented Jul 13, 2023

Uh oh!

pytorchmergebot commented Jul 13, 2023

Uh oh!

Uh oh!

		@@ -3028,14 +3028,20 @@ def module_inputs_torch_nn_ConstantPad3d(module_info, device, dtype, requires_gr
		module_inputs_func=module_inputs_torch_nn_ReflectionPad2d,

enable channels last for reflection padding on CPU #102518

enable channels last for reflection padding on CPU #102518

Uh oh!

Conversation

mingfeima commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

single core inference

single socket inference

Uh oh!

pytorch-bot bot commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102518

✅ No Failures

Uh oh!

cpuhrsch Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

mingfeima Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

mingfeima Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Jul 7, 2023

Uh oh!

cpuhrsch Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

mingfeima Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

cpuhrsch commented Jul 11, 2023

Uh oh!

Uh oh!

CaoE commented Jul 13, 2023

Uh oh!

cpuhrsch commented Jul 13, 2023

Uh oh!

pytorchmergebot commented Jul 13, 2023

Merge started

Uh oh!

Uh oh!

mingfeima commented May 30, 2023 •

edited

Loading

pytorch-bot bot commented May 30, 2023 •

edited

Loading