Skip to content

Conversation

slayton58
Copy link
Contributor

This PR implements basic support for new features in cuDNN v3. Summarized these are:

  • Adding versions of LCN and LRN layers
  • Adding support for new algorithms for both forward and now backwards convolution.
  • Move choice of algorithm + allocation of convolution workspace(s) from Forward_gpu to Reshape

@philkr
Copy link
Contributor

philkr commented Jul 10, 2015

Just out of curiosity where did you get cuDNN v3 from?

@slayton58 slayton58 force-pushed the cudnnV3 branch 3 times, most recently from cffc254 to 7c6d031 Compare July 13, 2015 22:32
@slayton58
Copy link
Contributor Author

@philkr Sorry, I missed this question - I'm an engineer for NVIDIA

@stas-sl
Copy link

stas-sl commented Aug 9, 2015

I tried this PR with CUDNN v3 RC on my customised GoogleNet model and got only marginal performance increase (16ms vs 17ms with CUDNN v2 per image). While memory usage increased 1.5x times from ~2GB to 3GB on Amazon g2.2xlarge instance.

@talda
Copy link

talda commented Aug 11, 2015

I also tried this PR and got similar results to what @stas-sl describes.
cudnnGetConvolutionForwardAlgorithm always selects either CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM or CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM.

When I comment out the call to cudnnGetConvolutionForwardAlgorithm and force CUDNN_CONVOLUTION_FWD_ALGO_FFT as the algorithm the call to cudnnGetConvolutionForwardWorkspaceSize returns CUDNN_STATUS_NOT_SUPPORTED, which means:
"The combination of the tensor descriptors, filter
descriptor and convolution descriptor is not
supported for the specified algorithm."

@slayton58 , Do you know do you know what are the limitations on these params for selecting the FFT algorithm?

@slayton58
Copy link
Contributor Author

@stas-sl Most of the potential performance increases in cuDNNv3 come from either tunings for Maxwell, or using new FFT-based convolution algorithms. Given the GPUs you're using I suspect you're currently not using either of those features.

@talda There are a few restrictions: H and W must be packed (wStride = 1, hStride = W), the stride must be 1 (default for Caffe's convolution layers) and the padded size (H + 2_pad_h, W + 2_pad_w) <= (256, 256)

@ducha-aiki
Copy link
Contributor

@slayton58, on Titan X, which is Maxwell, speed gain for GoogLeNet is ~5%. Is it maximum I can get for such architecture?

@xjtuljy
Copy link

xjtuljy commented Aug 17, 2015

@slayton58 I tried this with my own data and networks (1 convolutional + 2 fully connected), finding using gpu+cuDNN v3 is much slower than using gpu only (Time: CPU>>GPU+cuDNN>GPU only). I tried different batch sizes but the results showed that cuDNN is at least twice slower than gpu only, especially on the backward pass (at least 3 times slower). I'm quite new in caffe and currently working with C++ (vs2013) on windows, do you have any idea of the possible reasons? Many thanks!

@talda
Copy link

talda commented Aug 19, 2015

@slayton58 My problem was that in order to save some memory I changed the stride of one of the convolution layers to 3.

Now that I fixed this I don't get this error but the amount of memory needed as workspace just for the fft forward convolution layers is so big that I can't really train my network on a K40.

Can you post any example of a reasonably sized net that can be trained using the fft algorithm?

@liuyipei
Copy link

Ping: are there plans to get this merged soon? Or is this the start of a longer term project?

philkr added a commit to philkr/caffe that referenced this pull request Aug 28, 2015
@seanbell
Copy link

What's the status on this?

@slayton58 slayton58 force-pushed the cudnnV3 branch 2 times, most recently from 6cdf110 to 9fa56f0 Compare October 1, 2015 18:33
@hyojinie
Copy link

hyojinie commented Oct 5, 2015

Is cuDNN v3 currently supported?

@shelhamer
Copy link
Member

Merging as #3160 to include a few last details.

Thanks for this integration @slayton58 and sorry for the earlier holdup! I appreciate having the TODO for choosing the algo on Reshape() solved.

@shelhamer shelhamer closed this Oct 6, 2015
@shelhamer
Copy link
Member

@slayton58 this is failing CuDNNConvolutionLayerTest/0.TestSimpleConvolutionGroupCuDNNand crashing at CuDNNConvolutionLayerTest/0.TestGradientGroupCuDNN with error code 77 on a Titan X with CUDA 7.0 and driver version 346.46 for me. Please follow-up at #3160.

@slayton58
Copy link
Contributor Author

@shelhamer Fixed the convolution groups test failing - weight_offset_ wasn't being set correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing weight_offset_ caused the issue in #2737 (comment) since it was shadowing the BaseConvolutionLayer member that is set in BaseConvolutionLayer::LayerSetup().

@shelhamer
Copy link
Member

Once again merging as #3160 to include the little details mentioned there and the minimal fix for the weight_offset_. Thanks for taking another look @slayton58.

@shelhamer shelhamer closed this Oct 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants