Opencl some bug fixes and feature enhancement #5394

gongzg · 2017-03-13T11:30:06Z

The PR is mainly for bug fixing and also including some enhancement as below:

The kernel/program binary cache mechanism is working with latest viennacl now. For details, please refer clcaffe's wiki page. With this feature enabled, the initialization time of opencl caffe application will reduce dramatically.
Relax the image restrication for spatial convolution kernels, thus we need much less convolution kernels if the application need to process different image sizes for the same net model.
Fixed one race condition bug and now all test cases could always pass. The random fail sympton get fixed.
Add dilation support for the spatial convolution kernel.

For the to_gpu function with uninitialized memory case, we do not need to finish queue, and if the HEAD is at CPU and we support zero copy, then we also don't need to finish the queue. Signed-off-by: Zhigang Gong <[email protected]>

The caffe's timer has some overhead, and when our tunning kernel is very tiny, the overhead may cause very unstable timing result, so I increase the iteration count to reduce this type of overhead. Signed-off-by: Zhigang Gong <[email protected]>

If the spatial dimension is relatively large, we should use the default code path to achieve better parallelism. Signed-off-by: Zhigang Gong <[email protected]>

Sometimes, the sub buffer creation may fail, we need to take care of it. Signed-off-by: Zhigang Gong <[email protected]>

Some features e.g. opencl_unroll_hint are not allowed for beignet compiler, use __BEIGNET__ macro to choose whether to build with these features. Also add an helper func to faciliate judging beignet driver. Signed-off-by: Zhiwen Wu <[email protected]>

If the input image size changed during runtime, and the kernel type change to 2 or 5, we need to swizzle the weights again. Signed-off-by: Zhigang Gong <[email protected]>

Added a new basic convolution kernel that supports input image with no padding, so that no image padding in host code need anymore. Signed-off-by: Zhiwen Wu<[email protected]>

Signed-off-by: Zhigang Gong <[email protected]>

Change-Id: I392c4e73319fcfc18e628f9476b9bfdcba3cc206

Signed-off-by: Zhigang Gong <[email protected]>

If we simply use the cpu code path to copy the data, we will introduce one race condition between the GPU queue and the CPU. The scenario is: when we call it in an iteration loop. The data blob is in a zero-copy blob, and the first pass may be still blocking on the GPU side. The second pass will modify the data blob on CPU side before the data is accessed at the first pass on GPU side. We can simply add a synchronization point between the two iterations, but that is not a good fix as we force the GPU queue to flush and wait it to finish. The best way is to do the copy on the GPU side and in the same queue. Thus we don't need to worry about this race condition any more and without any interfere the GPU queue. Signed-off-by: Zhigang Gong <[email protected]>

This will cause relu gradient fail.

Prepare to support varying sizes. Signed-off-by: Zhigang Gong <[email protected]>

No need to tune different kernel for each different input size. Signed-off-by: Zhigang Gong <[email protected]>

Add the platform and driver information to change to use system cache directory if possible. After this change, we can reuse a offline tuned configurations. Signed-off-by: Zhigang Gong <[email protected]>

Signed-off-by: Zhigang Gong <[email protected]>

Zhigang Gong and others added 19 commits March 13, 2017 16:23

Remove unecessary finish in synced memory.

2108299

For the to_gpu function with uninitialized memory case, we do not need to finish queue, and if the HEAD is at CPU and we support zero copy, then we also don't need to finish the queue. Signed-off-by: Zhigang Gong <[email protected]>

Refine softmax layer's forward code path.

1a77cc6

If the spatial dimension is relatively large, we should use the default code path to achieve better parallelism. Signed-off-by: Zhigang Gong <[email protected]>

Refine error handling for spatial convolution.

2a35d32

Sometimes, the sub buffer creation may fail, we need to take care of it. Signed-off-by: Zhigang Gong <[email protected]>

Fix a bug in spatial convolution engine.

0ed9083

If the input image size changed during runtime, and the kernel type change to 2 or 5, we need to swizzle the weights again. Signed-off-by: Zhigang Gong <[email protected]>

spatial conv: Remove image padding

14dc7fc

Added a new basic convolution kernel that supports input image with no padding, so that no image padding in host code need anymore. Signed-off-by: Zhiwen Wu<[email protected]>

Remove unecessary clFinish in spatial convolution engine.

b5d98a8

Signed-off-by: Zhigang Gong <[email protected]>

Enable conv_spatial dilation parameters

c08a8a9

Change-Id: I392c4e73319fcfc18e628f9476b9bfdcba3cc206

Fix one kernel compilation test failure.

035465c

Signed-off-by: Zhigang Gong <[email protected]>

Fix a constant value bug

6ae9830

This will cause relu gradient fail.

Don't use fixed image size for GEMM like kernels.

4147a7d

Prepare to support varying sizes. Signed-off-by: Zhigang Gong <[email protected]>

Use tuning size rather than actual size.

f646c14

No need to tune different kernel for each different input size. Signed-off-by: Zhigang Gong <[email protected]>

Refine spatial kernel's cache mechanism.

fc86e43

Add the platform and driver information to change to use system cache directory if possible. After this change, we can reuse a offline tuned configurations. Signed-off-by: Zhigang Gong <[email protected]>

Redirect the Intel OpenCL backend information to wiki page.

e9fbabe

Signed-off-by: Zhigang Gong <[email protected]>

Eliminate some OCL kernel warnings.

8d133fd

Signed-off-by: Zhigang Gong <[email protected]>

CMAKE_EXT should be empty for now.

859f31d

Signed-off-by: Zhigang Gong <[email protected]>

Lint fix.

11665ce

Signed-off-by: Zhigang Gong <[email protected]>

naibaf7 merged commit 479d3a0 into BVLC:opencl Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Opencl some bug fixes and feature enhancement #5394

Opencl some bug fixes and feature enhancement #5394

Uh oh!

gongzg commented Mar 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Opencl some bug fixes and feature enhancement #5394

Opencl some bug fixes and feature enhancement #5394

Uh oh!

Conversation

gongzg commented Mar 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants