Skip to content

Segfault during caffe::init #3788

@dfotland

Description

@dfotland

I'm using caffe-rc3 on Ubuntu. Caffe tests pass. mnist sample runs perfectly. I have a trained net with a net and weight files. Everything works perfectly in CPU mode. GPU crashes. I've spent a few hours with gdb and the crash happens when caffe_rng_uniform() calls caffe_rng() and rng_stream returns 0x1, a bad pointer.

16 inline rng_t* caffe_rng() {
17 return static_castcaffe::rng_t*(Caffe::rng_stream().generator());
18 }
1

random_generator pointer is 0x1, which causes the crash when it is dereferenced

(gdb) p *caffe::thread_instance_.get()
$49 = {cublas_handle_ = 0x4df9160, curand_generator_ = 0x4dfab10, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}

However, caffe Get() has a good pointer. it seems like the thread specific data and the singleton data are different. I can;t figure out why.

(gdb) p *caffe::Caffe::Get().random_generator_
$46 = (caffe::Caffe::RNG &) @0x4df9160: {generator_ = {px = 0x7fffffff00000200, pn = {pi_ = 0xffff0000ffff}}}

backtrace:
(gdb) bt
#0 caffe::caffe_rng () at ./include/caffe/util/rng.hpp:17
#1 0x00007ffff723f833 in caffe::caffe_rng_uniform (n=81536, a=-0.0686263517, b=0.0686263517, r=0x201200000)

at src/caffe/util/math_functions.cpp:252

#2 0x00007ffff716ae72 in caffe::XavierFiller::Fill (this=0x60a4fb0, blob=0x60a47f0) at ./include/caffe/filler.hpp:161
#3 0x00007ffff71f7d82 in caffe::BaseConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/base_conv_layer.cpp:170

#4 0x00007ffff7195c33 in caffe::CuDNNConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/cudnn_conv_layer.cpp:20

#5 0x00007ffff7155548 in caffe::Layer::SetUp (this=0x60a0620, bottom=std::vector of length 1, capacity 1 = {...},

top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:71

#6 0x00007ffff7295246 in caffe::Net::Init (this=0x4e4a890, in_param=...) at src/caffe/net.cpp:148
#7 0x00007ffff72939e0 in caffe::Net::Net (this=0x4e4a890, param_file="/home/ubuntu/linux/gtpmfgo//golast19.prototxt",

phase=caffe::TEST, root_net=0x0) at src/caffe/net.cpp:36

#8 0x00000000004fb4c6 in caffe_init (path=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:63
#9 0x00000000004de674 in uct_init_all (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/uct.c:465

#10 0x000000000048cbbc in init_mfgo (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/G2init.c:112

#11 0x00000000004fa301 in main (argc=6, argv=0x7fffffffe5d8) at gtpmfgo.cpp:1545

My code invoking caffe (use_gpu is true:

int caffe_init(const char *path, int use_gpu) {

ifdef HAVE_CAFFE

    int argc = 2;
    char *fake_args[] = { "gtpmfgo", "ManyFaces" };
    char **argv = fake_args;
    GlobalInit(&argc, &argv);
    if (use_gpu) {
            Caffe::set_mode(Caffe::GPU);
            Caffe::SetDevice(0);
            Caffe::DeviceQuery();
    }
    else {
            Caffe::set_mode(Caffe::CPU);
    }

    if (caffe_test_net != NULL) delete caffe_test_net;
    string file_path = path;
    file_path += "/";
    caffe_test_net = new Net<float>(file_path + filename_net, TEST);
    caffe_test_net->CopyTrainedLayersFrom(file_path + filename_parameters);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions