-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
I'm using caffe-rc3 on Ubuntu. Caffe tests pass. mnist sample runs perfectly. I have a trained net with a net and weight files. Everything works perfectly in CPU mode. GPU crashes. I've spent a few hours with gdb and the crash happens when caffe_rng_uniform() calls caffe_rng() and rng_stream returns 0x1, a bad pointer.
16 inline rng_t* caffe_rng() {
17 return static_castcaffe::rng_t*(Caffe::rng_stream().generator());
18 }
1
random_generator pointer is 0x1, which causes the crash when it is dereferenced
(gdb) p *caffe::thread_instance_.get()
$49 = {cublas_handle_ = 0x4df9160, curand_generator_ = 0x4dfab10, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
However, caffe Get() has a good pointer. it seems like the thread specific data and the singleton data are different. I can;t figure out why.
(gdb) p *caffe::Caffe::Get().random_generator_
$46 = (caffe::Caffe::RNG &) @0x4df9160: {generator_ = {px = 0x7fffffff00000200, pn = {pi_ = 0xffff0000ffff}}}
backtrace:
(gdb) bt
#0 caffe::caffe_rng () at ./include/caffe/util/rng.hpp:17
#1 0x00007ffff723f833 in caffe::caffe_rng_uniform (n=81536, a=-0.0686263517, b=0.0686263517, r=0x201200000)
at src/caffe/util/math_functions.cpp:252
#2 0x00007ffff716ae72 in caffe::XavierFiller::Fill (this=0x60a4fb0, blob=0x60a47f0) at ./include/caffe/filler.hpp:161
#3 0x00007ffff71f7d82 in caffe::BaseConvolutionLayer::LayerSetUp (this=0x60a0620,
bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/base_conv_layer.cpp:170
#4 0x00007ffff7195c33 in caffe::CuDNNConvolutionLayer::LayerSetUp (this=0x60a0620,
bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/cudnn_conv_layer.cpp:20
#5 0x00007ffff7155548 in caffe::Layer::SetUp (this=0x60a0620, bottom=std::vector of length 1, capacity 1 = {...},
top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:71
#6 0x00007ffff7295246 in caffe::Net::Init (this=0x4e4a890, in_param=...) at src/caffe/net.cpp:148
#7 0x00007ffff72939e0 in caffe::Net::Net (this=0x4e4a890, param_file="/home/ubuntu/linux/gtpmfgo//golast19.prototxt",
phase=caffe::TEST, root_net=0x0) at src/caffe/net.cpp:36
#8 0x00000000004fb4c6 in caffe_init (path=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:63
#9 0x00000000004de674 in uct_init_all (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)
at ../src/uct.c:465
#10 0x000000000048cbbc in init_mfgo (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)
at ../src/G2init.c:112
#11 0x00000000004fa301 in main (argc=6, argv=0x7fffffffe5d8) at gtpmfgo.cpp:1545
My code invoking caffe (use_gpu is true:
int caffe_init(const char *path, int use_gpu) {
ifdef HAVE_CAFFE
int argc = 2;
char *fake_args[] = { "gtpmfgo", "ManyFaces" };
char **argv = fake_args;
GlobalInit(&argc, &argv);
if (use_gpu) {
Caffe::set_mode(Caffe::GPU);
Caffe::SetDevice(0);
Caffe::DeviceQuery();
}
else {
Caffe::set_mode(Caffe::CPU);
}
if (caffe_test_net != NULL) delete caffe_test_net;
string file_path = path;
file_path += "/";
caffe_test_net = new Net<float>(file_path + filename_net, TEST);
caffe_test_net->CopyTrainedLayersFrom(file_path + filename_parameters);