-
Notifications
You must be signed in to change notification settings - Fork 18.6k
upgrading Accuracy layer #5836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrading Accuracy layer #5836
Conversation
cda371e
to
b4ab3c5
Compare
Thanks for boosting the efficiency! Sorry for the total nitpick, but can you fix the file permissions changes that switch 100644 → 100755 for merge? |
b4ab3c5
to
31bc9c6
Compare
@shelhamer Thankyou for "focus"ing on this PR. Is there anything else I can do to ease the process of accepting this PR? |
I just gave this PR a spin but the results don't look right... at first I noticed my DIGITS plots broken, like accuracy going out of scale. Then I dug a little and found the weird thing: it tends to stably increase even when the network is already at a loss minimum. It increases by about (accuracy for one minibatch)*(number of minibatches in a test epoch) at every epoch. It's just something I observed after 2 runs on 2 different nets and datasets (no time to investigate it deeper at the moment), so this may be a bit far-fetched theory, but something is definitely off. Maybe it's just for me though? Has anyone tested it in action beside doing EDIT: The CPU version of this works well. Is it possible that you forgot to reset the accumulators ( EDIT2: Yep, I can confirm that is the case. Added a relevant comment below. |
|
||
// Similarly, this memory is never used elsewhere, and thus we can use it | ||
// to avoid having to allocate additional GPU memory. | ||
Dtype* counts = bottom[1]->mutable_gpu_diff(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acc_data
and counts
need to be reset between iterations, right here. Add:
caffe_gpu_set(bottom[0]->count(), Dtype(0), acc_data);
caffe_gpu_set(bottom[1]->count(), Dtype(0), counts);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Noiredd this is a bit of an overkill to go over all the arrays.
The bug is much simpler to fix. In line 32, there is a +=
for the accumulator:
acc[index] += (num_better_predictions < top_k);
Replacing it with a simple =
assignment is enough to fix the bug.
BTW, it might be interesting to see if this fix will run even faster than the fix you proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shaibagon Yep this works too and is clearer. Regarding speed, it shaves about 30 seconds more (though bear in mind I didn't do any serious benchmarks with clean test environment, repeated runs, averaged times etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Noiredd Thank you!!
You really upgraded this PR with all your tests and benchmarks.
I ran a quick benchmark on a TITAN Z using FCN AlexNet on PASCAL VOC 2012 data (1464 training images, 1449 validation, 21 classes) at top 3 accuracy, training for 6 epochs and testing on entire val set every epoch (7 times). Results look very good:
One thing I noticed is I keep getting 100% result at the first test iteration - current master (and #5734) output much smaller (and more believable) numbers. Over the next epochs it looks normal again. This seems to only happen for segmentation networks, regardless of params ( |
@Noiredd Thank you for spotting the bug! I updated |
@shelhamer After the very thorough review of @Noiredd, I made these changes: Thank you @Noiredd for your work to make this PR very high quality! |
7ca82da
to
e370424
Compare
@shaibagon I can confirm that accuracies look good now. What I'm still worried about is that first iteration (before the training starts) when training FCNs. I load a "convolutionized" AlexNet and initialize it with converted pretrained model except the last layer, which is initialized with Gaussian noise. Results at the first iteration should be bad and that's what the current master build says:
When I switch to this PR, this is what I get instead:
Accuracy = 1 should imply each per-class accuracy of 1 - am I not right? In further iterations it drops to about 0.87 which is consistent with master. But this first one just doesn't make sense. PS: @shelhamer will probably remind you about the |
(1) Regarding the high accuracy at first iteration: |
Took me a while to get down to the bottom of this; turned out it was a problem of my model. Due to bad initialization, it outputted all zeros on the first iteration - so your answer (1) explains it. My fault of course. On the other hand though, if P out of N elements of the output probability vector have equal, high values, and the real answer is also among them, should that be treated as a hit even if P > |
@Noiredd You are making a good point here: Accuracy layer should only count predictions that are strictly within top_k. If there are other predictions with equal probability, they should be counted as part of top_k. I'll make the necessary changes |
@shaibagon Just modifying the inequality in |
@Noiredd does that solves also the issue with Accuracy=1 while per-class accuracy is less than 1? |
@shaibagon Yes. Accuracy=1 and per-class accuracy<1 only happened when the network produced an all-zero output. With those changes, both total and per-class accuracy will be 0. |
e370424
to
237328d
Compare
…op_k, no need for fancy priority_queue etc. (2) GPU implementation
237328d
to
ddfd4d3
Compare
@shelhamer @Noiredd Thank you for your efforts on improving this PR! |
I have no further objections ;) Great contribution by @shaibagon! |
Hi, I've just created a new issue (#5981) which may be related to this merge! |
As noted by @Noiredd in PR #5734, current implementation of
"Accuracy"
layer is inefficient.This PR makes two contributions:
(1) efficient CPU implementation, takes only O(L) for top_k per pixel prediction.
Unlike PR#5734 no need for fancy priority_queue or other complex data structures.
(2) GPU implementation: no need to sync host/device memory.