Basic cuDNN v3 support #2737

slayton58 · 2015-07-10T17:05:00Z

This PR implements basic support for new features in cuDNN v3. Summarized these are:

Adding versions of LCN and LRN layers
Adding support for new algorithms for both forward and now backwards convolution.
Move choice of algorithm + allocation of convolution workspace(s) from Forward_gpu to Reshape

philkr · 2015-07-10T18:33:22Z

Just out of curiosity where did you get cuDNN v3 from?

slayton58 · 2015-07-17T12:11:31Z

@philkr Sorry, I missed this question - I'm an engineer for NVIDIA

stas-sl · 2015-08-09T21:50:18Z

I tried this PR with CUDNN v3 RC on my customised GoogleNet model and got only marginal performance increase (16ms vs 17ms with CUDNN v2 per image). While memory usage increased 1.5x times from ~2GB to 3GB on Amazon g2.2xlarge instance.

talda · 2015-08-11T11:51:07Z

I also tried this PR and got similar results to what @stas-sl describes.
cudnnGetConvolutionForwardAlgorithm always selects either CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM or CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM.

When I comment out the call to cudnnGetConvolutionForwardAlgorithm and force CUDNN_CONVOLUTION_FWD_ALGO_FFT as the algorithm the call to cudnnGetConvolutionForwardWorkspaceSize returns CUDNN_STATUS_NOT_SUPPORTED, which means:
"The combination of the tensor descriptors, filter
descriptor and convolution descriptor is not
supported for the specified algorithm."

@slayton58 , Do you know do you know what are the limitations on these params for selecting the FFT algorithm?

slayton58 · 2015-08-11T18:40:54Z

@stas-sl Most of the potential performance increases in cuDNNv3 come from either tunings for Maxwell, or using new FFT-based convolution algorithms. Given the GPUs you're using I suspect you're currently not using either of those features.

@talda There are a few restrictions: H and W must be packed (wStride = 1, hStride = W), the stride must be 1 (default for Caffe's convolution layers) and the padded size (H + 2_pad_h, W + 2_pad_w) <= (256, 256)

ducha-aiki · 2015-08-12T07:46:58Z

@slayton58, on Titan X, which is Maxwell, speed gain for GoogLeNet is ~5%. Is it maximum I can get for such architecture?

xjtuljy · 2015-08-17T05:22:49Z

@slayton58 I tried this with my own data and networks (1 convolutional + 2 fully connected), finding using gpu+cuDNN v3 is much slower than using gpu only (Time: CPU>>GPU+cuDNN>GPU only). I tried different batch sizes but the results showed that cuDNN is at least twice slower than gpu only, especially on the backward pass (at least 3 times slower). I'm quite new in caffe and currently working with C++ (vs2013) on windows, do you have any idea of the possible reasons? Many thanks!

talda · 2015-08-19T11:42:06Z

@slayton58 My problem was that in order to save some memory I changed the stride of one of the convolution layers to 3.

Now that I fixed this I don't get this error but the amount of memory needed as workspace just for the fft forward convolution layers is so big that I can't really train my network on a K40.

Can you post any example of a reasonably sized net that can be trained using the fft algorithm?

liuyipei · 2015-08-21T23:15:45Z

Ping: are there plans to get this merged soon? Or is this the start of a longer term project?

Basic cuDNN v3 support

seanbell · 2015-09-14T22:25:16Z

What's the status on this?

hyojinie · 2015-10-05T17:53:36Z

Is cuDNN v3 currently supported?

shelhamer · 2015-10-06T20:12:23Z

Merging as #3160 to include a few last details.

Thanks for this integration @slayton58 and sorry for the earlier holdup! I appreciate having the TODO for choosing the algo on Reshape() solved.

shelhamer · 2015-10-06T20:41:36Z

@slayton58 this is failing CuDNNConvolutionLayerTest/0.TestSimpleConvolutionGroupCuDNNand crashing at CuDNNConvolutionLayerTest/0.TestGradientGroupCuDNN with error code 77 on a Titan X with CUDA 7.0 and driver version 346.46 for me. Please follow-up at #3160.

slayton58 · 2015-10-07T17:09:45Z

@shelhamer Fixed the convolution groups test failing - weight_offset_ wasn't being set correctly

shelhamer · 2015-10-16T01:13:38Z

include/caffe/vision_layers.hpp

Introducing weight_offset_ caused the issue in #2737 (comment) since it was shadowing the BaseConvolutionLayer member that is set in BaseConvolutionLayer::LayerSetup().

shelhamer · 2015-10-16T01:16:58Z

Once again merging as #3160 to include the little details mentioned there and the minimal fix for the weight_offset_. Thanks for taking another look @slayton58.

slayton58 force-pushed the cudnnV3 branch 3 times, most recently from cffc254 to 7c6d031 Compare July 13, 2015 22:32

shelhamer added the speed-up label Aug 10, 2015

philkr added a commit to philkr/caffe that referenced this pull request Aug 28, 2015

Merge pull request BVLC#2737 from slayton58/cudnnV3

3a4d6d9

Basic cuDNN v3 support

slayton58 force-pushed the cudnnV3 branch 2 times, most recently from 6cdf110 to 9fa56f0 Compare October 1, 2015 18:33

shelhamer mentioned this pull request Oct 6, 2015

Basic cuDNN v3 support (update) #3160

Merged

shelhamer closed this Oct 6, 2015

shelhamer reopened this Oct 7, 2015

slayton58 force-pushed the cudnnV3 branch from 9fa56f0 to 2753889 Compare October 7, 2015 17:00

Initial cuDNN v3 support

ffee008

slayton58 force-pushed the cudnnV3 branch from 2753889 to ffee008 Compare October 8, 2015 15:53

shelhamer reviewed Oct 16, 2015
View reviewed changes

shelhamer closed this Oct 16, 2015

Basic cuDNN v3 support #2737

Basic cuDNN v3 support #2737

Uh oh!

Conversation

slayton58 commented Jul 10, 2015

Uh oh!

philkr commented Jul 10, 2015

Uh oh!

slayton58 commented Jul 17, 2015

Uh oh!

stas-sl commented Aug 9, 2015

Uh oh!

talda commented Aug 11, 2015

Uh oh!

slayton58 commented Aug 11, 2015

Uh oh!

ducha-aiki commented Aug 12, 2015

Uh oh!

xjtuljy commented Aug 17, 2015

Uh oh!

talda commented Aug 19, 2015

Uh oh!

liuyipei commented Aug 21, 2015

Uh oh!

seanbell commented Sep 14, 2015

Uh oh!

hyojinie commented Oct 5, 2015

Uh oh!

shelhamer commented Oct 6, 2015

Uh oh!

shelhamer commented Oct 6, 2015

Uh oh!

slayton58 commented Oct 7, 2015

Uh oh!

shelhamer Oct 16, 2015

Choose a reason for hiding this comment

Uh oh!

shelhamer commented Oct 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants