-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Fft based convolutional layer #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
multiple test nets
for first/last 500 iterations
Specify net params in solver; log {Net,Solver} parameters; multiple test nets
Conflicts: src/caffe/proto/caffe.proto
Conflicts: src/caffe/proto/caffe.proto
Conflicts: include/caffe/vision_layers.hpp src/caffe/proto/caffe.proto
1701 is the canonical random seed, and as this test makes only one call for seeding there's no need for a member var.
Reproduce elementwise product layer in more generality. Add elementwise operation parameter. Prepare for elementwise sum operation choice.
cpu/gpu and leveldb/lmdb; now just one copy of each test body
Minor Net::Init refactoring: name loop indices, add helpers
Add support for LMDB (LevelDB alternative)
Otherwise initialization will be performed on whichever device is default.
Fix Caffe::SetDevice to avoid initializing on default device
No openmp.
flatten Forward_fft_task added condition when fft used: kernel_size/ stride > 3.
added openmp support put all fftw wrappers into caffe/util/fft.hpp and caffe.cpp files
cleaned makefile and makefile.config
replaced build-in std::complex multiplication by c-style implementation
This is great! But you need to set the target of the PR to BVLC:dev. GitHub does not allow one to change it. So you have to replace this with a new one. |
Done |
I used borisgin/dev to rebase. Should I rebase wrt BLVC/dev ? |
Yeah you should rebase against BVLC/dev On Thursday, June 26, 2014, Boris Ginzburg [email protected] wrote:
Sergio |
I implemented Convolutional layer with fft-based Forward() . There is no FFT support in Backward() yet.
The implementation is based on FFTW3 library. It was tested both with native FFTW3 and with MKL.
Into addition it supports OpenMP to utilize all cores. Was tested with native gcc OpenMP and with MKL.
The current version is CPU-only . Is anybody interested in doing CUDA - version?
My impression, based on current CPU implementation (FFT+openMP), is that FFT-based convolutional layer makes sense only for large kernels (kernel_size / stride >= 7). There are more details on benchmark below:
I modified net_speed-benchmark to test Forward() only, then I took "examples/imagenet" imagenet topology and modified two first convolutonal layers:
batch = 128
stride = 1
kernel = { 5,7, 9,11,13,15}
10 forward iterations
For each kernel I slightly changed crop size in data layers to make map size FFT- friendly (128, 256,..) The results are below (time is seconds for 10 forward iterations):