Skip to content

Conversation

borisgin
Copy link

I implemented Convolutional layer with fft-based Forward() . There is no FFT support in Backward() yet.
The implementation is based on FFTW3 library. It was tested both with native FFTW3 and with MKL.
Into addition it supports OpenMP to utilize all cores. Was tested with native gcc OpenMP and with MKL.
The current version is CPU-only . Is anybody interested in doing CUDA - version?
My impression, based on current CPU implementation (FFT+openMP), is that FFT-based convolutional layer makes sense only for large kernels (kernel_size / stride >= 7). There are more details on benchmark below:
I modified net_speed-benchmark to test Forward() only, then I took "examples/imagenet" imagenet topology and modified two first convolutonal layers:

  • batch = 128

  • stride = 1

  • kernel = { 5,7, 9,11,13,15}

  • 10 forward iterations
    For each kernel I slightly changed crop size in data layers to make map size FFT- friendly (128, 256,..) The results are below (time is seconds for 10 forward iterations):

    Layer Kernel Input Output base,sec fft,sec
    conv1 15 128x3x242x242 128x96x228x228 79 28
    conv2 15 128x96x114x114 128x256x104x104 549 168
    ----------------------------------------------------------------------------
    conv1 13 128x3x244x244 128x96x232x232 58 30
    conv2 13 128x96x116x116 128x256x108x108 431 170
    ----------------------------------------------------------------------------
    conv1 11 128x3x246x246 128x96x236x236 44 28
    conv2 11 128x96x118x118 128x256x112x112 314 168
    ---------------------------------------------------------------------------
    conv1 9 128x3x248x248 128x96x240x240 33 29
    conv2 9 128x96x120x120 128x256x116x116 230 170
    --------------------------------------------------------------------------
    conv1 7 128x3x250x250 128x96x244x244 23 29
    conv2 7 128x96x122x122 128x256x122x122 152 170
    -------------------------------------------------------------------------
    conv1 5 128x3x252x252 128x96x248x248 16 28
    conv2 5 128x96x124x124 128x256x120x120 83 167

jeffdonahue and others added 30 commits May 9, 2014 19:51
Specify net params in solver; log {Net,Solver} parameters; multiple test nets
Conflicts:

	src/caffe/proto/caffe.proto
Conflicts:

	src/caffe/proto/caffe.proto
Conflicts:

	include/caffe/vision_layers.hpp
	src/caffe/proto/caffe.proto
1701 is the canonical random seed, and as this test makes only one call
for seeding there's no need for a member var.
Reproduce elementwise product layer in more generality.
Add elementwise operation parameter.
Prepare for elementwise sum operation choice.
mavenlin and others added 25 commits June 12, 2014 22:23
cpu/gpu and leveldb/lmdb; now just one copy of each test body
Minor Net::Init refactoring: name loop indices, add helpers
Add support for LMDB (LevelDB alternative)
Otherwise initialization will be performed on whichever device is
default.
Fix Caffe::SetDevice to avoid initializing on default device
flatten Forward_fft_task
added condition when fft used: kernel_size/ stride > 3.
added openmp support
put all fftw wrappers into caffe/util/fft.hpp and caffe.cpp files
cleaned makefile and makefile.config
replaced build-in std::complex multiplication by c-style implementation
@kloudkl
Copy link
Contributor

kloudkl commented Jun 26, 2014

This is great! But you need to set the target of the PR to BVLC:dev. GitHub does not allow one to change it. So you have to replace this with a new one.

@borisgin borisgin closed this Jun 26, 2014
@borisgin
Copy link
Author

Done

@borisgin
Copy link
Author

I used borisgin/dev to rebase. Should I rebase wrt BLVC/dev ?

@sguada
Copy link
Contributor

sguada commented Jun 26, 2014

Yeah you should rebase against BVLC/dev

On Thursday, June 26, 2014, Boris Ginzburg [email protected] wrote:

I used borisgin/dev to rebase. Should I rebase wrt BLVC/dev ?


Reply to this email directly or view it on GitHub
#542 (comment).

Sergio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.