-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Opencl some bug fixes and feature enhancement #5394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For the to_gpu function with uninitialized memory case, we do not need to finish queue, and if the HEAD is at CPU and we support zero copy, then we also don't need to finish the queue. Signed-off-by: Zhigang Gong <[email protected]>
The caffe's timer has some overhead, and when our tunning kernel is very tiny, the overhead may cause very unstable timing result, so I increase the iteration count to reduce this type of overhead. Signed-off-by: Zhigang Gong <[email protected]>
If the spatial dimension is relatively large, we should use the default code path to achieve better parallelism. Signed-off-by: Zhigang Gong <[email protected]>
Sometimes, the sub buffer creation may fail, we need to take care of it. Signed-off-by: Zhigang Gong <[email protected]>
Some features e.g. opencl_unroll_hint are not allowed for beignet compiler, use __BEIGNET__ macro to choose whether to build with these features. Also add an helper func to faciliate judging beignet driver. Signed-off-by: Zhiwen Wu <[email protected]>
If the input image size changed during runtime, and the kernel type change to 2 or 5, we need to swizzle the weights again. Signed-off-by: Zhigang Gong <[email protected]>
Added a new basic convolution kernel that supports input image with no padding, so that no image padding in host code need anymore. Signed-off-by: Zhiwen Wu<[email protected]>
Signed-off-by: Zhigang Gong <[email protected]>
Change-Id: I392c4e73319fcfc18e628f9476b9bfdcba3cc206
Signed-off-by: Zhigang Gong <[email protected]>
If we simply use the cpu code path to copy the data, we will introduce one race condition between the GPU queue and the CPU. The scenario is: when we call it in an iteration loop. The data blob is in a zero-copy blob, and the first pass may be still blocking on the GPU side. The second pass will modify the data blob on CPU side before the data is accessed at the first pass on GPU side. We can simply add a synchronization point between the two iterations, but that is not a good fix as we force the GPU queue to flush and wait it to finish. The best way is to do the copy on the GPU side and in the same queue. Thus we don't need to worry about this race condition any more and without any interfere the GPU queue. Signed-off-by: Zhigang Gong <[email protected]>
This will cause relu gradient fail.
Prepare to support varying sizes. Signed-off-by: Zhigang Gong <[email protected]>
No need to tune different kernel for each different input size. Signed-off-by: Zhigang Gong <[email protected]>
Add the platform and driver information to change to use system cache directory if possible. After this change, we can reuse a offline tuned configurations. Signed-off-by: Zhigang Gong <[email protected]>
Signed-off-by: Zhigang Gong <[email protected]>
Signed-off-by: Zhigang Gong <[email protected]>
Signed-off-by: Zhigang Gong <[email protected]>
Signed-off-by: Zhigang Gong <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The PR is mainly for bug fixing and also including some enhancement as below: