Skip to content

Conversation

shaibagon
Copy link
Member

@shaibagon shaibagon commented Aug 10, 2017

As noted by @Noiredd in PR #5734, current implementation of "Accuracy" layer is inefficient.
This PR makes two contributions:
(1) efficient CPU implementation, takes only O(L) for top_k per pixel prediction.
Unlike PR#5734 no need for fancy priority_queue or other complex data structures.
(2) GPU implementation: no need to sync host/device memory.

@shaibagon shaibagon force-pushed the upgrade_accuracy branch 2 times, most recently from cda371e to b4ab3c5 Compare August 13, 2017 08:54
@shaibagon shaibagon closed this Aug 13, 2017
@shaibagon shaibagon reopened this Aug 13, 2017
@shelhamer
Copy link
Member

Thanks for boosting the efficiency!

Sorry for the total nitpick, but can you fix the file permissions changes that switch 100644 → 100755 for merge?

@shaibagon
Copy link
Member Author

@shelhamer Thankyou for "focus"ing on this PR.
I just chomd the files and committed.
Auto-tests seems to pass smoothly.

Is there anything else I can do to ease the process of accepting this PR?

@Noiredd
Copy link
Member

Noiredd commented Aug 21, 2017

I just gave this PR a spin but the results don't look right... at first I noticed my DIGITS plots broken, like accuracy going out of scale. Then I dug a little and found the weird thing: it tends to stably increase even when the network is already at a loss minimum. It increases by about (accuracy for one minibatch)*(number of minibatches in a test epoch) at every epoch. It's just something I observed after 2 runs on 2 different nets and datasets (no time to investigate it deeper at the moment), so this may be a bit far-fetched theory, but something is definitely off. Maybe it's just for me though? Has anyone tested it in action beside doing make runtest?

EDIT: The CPU version of this works well. Is it possible that you forgot to reset the accumulators (acc_data, counts) at the beginning of AccuracyLayer::Forward_gpu(), and they basically integrate correct answers throughout epochs?

EDIT2: Yep, I can confirm that is the case. Added a relevant comment below.


// Similarly, this memory is never used elsewhere, and thus we can use it
// to avoid having to allocate additional GPU memory.
Dtype* counts = bottom[1]->mutable_gpu_diff();
Copy link
Member

@Noiredd Noiredd Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acc_data and counts need to be reset between iterations, right here. Add:

caffe_gpu_set(bottom[0]->count(), Dtype(0), acc_data);
caffe_gpu_set(bottom[1]->count(), Dtype(0), counts);

Copy link
Member Author

@shaibagon shaibagon Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Noiredd this is a bit of an overkill to go over all the arrays.
The bug is much simpler to fix. In line 32, there is a += for the accumulator:

acc[index] += (num_better_predictions < top_k);

Replacing it with a simple = assignment is enough to fix the bug.

BTW, it might be interesting to see if this fix will run even faster than the fix you proposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaibagon Yep this works too and is clearer. Regarding speed, it shaves about 30 seconds more (though bear in mind I didn't do any serious benchmarks with clean test environment, repeated runs, averaged times etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Noiredd Thank you!!
You really upgraded this PR with all your tests and benchmarks.

@Noiredd
Copy link
Member

Noiredd commented Aug 23, 2017

I ran a quick benchmark on a TITAN Z using FCN AlexNet on PASCAL VOC 2012 data (1464 training images, 1449 validation, 21 classes) at top 3 accuracy, training for 6 epochs and testing on entire val set every epoch (7 times). Results look very good:

One thing I noticed is I keep getting 100% result at the first test iteration - current master (and #5734) output much smaller (and more believable) numbers. Over the next epochs it looks normal again. This seems to only happen for segmentation networks, regardless of params (top_k, ignore_label).

@shaibagon
Copy link
Member Author

@Noiredd Thank you for spotting the bug! I updated test_accuracy_layer.cpp so that it run each forward test for several iterations: this way the test reproduce the error and the fix can be confirmed by the automatic testing.

@shaibagon shaibagon closed this Aug 24, 2017
@shaibagon
Copy link
Member Author

shaibagon commented Aug 24, 2017

@shelhamer After the very thorough review of @Noiredd, I made these changes:
(1) Updating test_accuracy_layer.cpp - run all tests for 3 iterations. This change catches the bug in the auto tests.
(2) Fixed the bug in accuracy_layer.cu and verified the new tests are passing.

Thank you @Noiredd for your work to make this PR very high quality!

@shaibagon shaibagon reopened this Aug 24, 2017
@shaibagon shaibagon force-pushed the upgrade_accuracy branch 2 times, most recently from 7ca82da to e370424 Compare August 24, 2017 07:25
@Noiredd
Copy link
Member

Noiredd commented Aug 24, 2017

@shaibagon I can confirm that accuracies look good now. What I'm still worried about is that first iteration (before the training starts) when training FCNs. I load a "convolutionized" AlexNet and initialize it with converted pretrained model except the last layer, which is initialized with Gaussian noise. Results at the first iteration should be bad and that's what the current master build says:

I0824 10:35:16.934343 29462 solver.cpp:397]     Test net output #0: accuracy = 0.0374341
I0824 10:35:16.934394 29462 solver.cpp:397]     Test net output #1: accuracy_perclass = 0
I0824 10:35:16.934404 29462 solver.cpp:397]     Test net output #2: accuracy_perclass = 0
I0824 10:35:16.934412 29462 solver.cpp:397]     Test net output #3: accuracy_perclass = 0
I0824 10:35:16.934418 29462 solver.cpp:397]     Test net output #4: accuracy_perclass = 0
I0824 10:35:16.934425 29462 solver.cpp:397]     Test net output #5: accuracy_perclass = 0
I0824 10:35:16.934432 29462 solver.cpp:397]     Test net output #6: accuracy_perclass = 0
I0824 10:35:16.934438 29462 solver.cpp:397]     Test net output #7: accuracy_perclass = 0
I0824 10:35:16.934445 29462 solver.cpp:397]     Test net output #8: accuracy_perclass = 0
I0824 10:35:16.934453 29462 solver.cpp:397]     Test net output #9: accuracy_perclass = 0
I0824 10:35:16.934459 29462 solver.cpp:397]     Test net output #10: accuracy_perclass = 0
I0824 10:35:16.934466 29462 solver.cpp:397]     Test net output #11: accuracy_perclass = 0
I0824 10:35:16.934473 29462 solver.cpp:397]     Test net output #12: accuracy_perclass = 0
I0824 10:35:16.934479 29462 solver.cpp:397]     Test net output #13: accuracy_perclass = 0
I0824 10:35:16.934486 29462 solver.cpp:397]     Test net output #14: accuracy_perclass = 0
I0824 10:35:16.934494 29462 solver.cpp:397]     Test net output #15: accuracy_perclass = 0
I0824 10:35:16.934499 29462 solver.cpp:397]     Test net output #16: accuracy_perclass = 0
I0824 10:35:16.934506 29462 solver.cpp:397]     Test net output #17: accuracy_perclass = 0
I0824 10:35:16.934514 29462 solver.cpp:397]     Test net output #18: accuracy_perclass = 0
I0824 10:35:16.934520 29462 solver.cpp:397]     Test net output #19: accuracy_perclass = 0.0621118
I0824 10:35:16.934526 29462 solver.cpp:397]     Test net output #20: accuracy_perclass = 0.057971
I0824 10:35:16.934533 29462 solver.cpp:397]     Test net output #21: accuracy_perclass = 0.0510697

When I switch to this PR, this is what I get instead:

I0824 10:27:48.350045 29098 solver.cpp:397]     Test net output #0: accuracy = 1
I0824 10:27:48.350124 29098 solver.cpp:397]     Test net output #1: accuracy_perclass = 0.995859
I0824 10:27:48.350136 29098 solver.cpp:397]     Test net output #2: accuracy_perclass = 0.0621118
I0824 10:27:48.350144 29098 solver.cpp:397]     Test net output #3: accuracy_perclass = 0.0545204
I0824 10:27:48.350152 29098 solver.cpp:397]     Test net output #4: accuracy_perclass = 0.0710835
I0824 10:27:48.350158 29098 solver.cpp:397]     Test net output #5: accuracy_perclass = 0.0496894
I0824 10:27:48.350164 29098 solver.cpp:397]     Test net output #6: accuracy_perclass = 0.0662526
I0824 10:27:48.350172 29098 solver.cpp:397]     Test net output #7: accuracy_perclass = 0.0510697
I0824 10:27:48.350178 29098 solver.cpp:397]     Test net output #8: accuracy_perclass = 0.0876467
I0824 10:27:48.350185 29098 solver.cpp:397]     Test net output #9: accuracy_perclass = 0.0821256
I0824 10:27:48.350191 29098 solver.cpp:397]     Test net output #10: accuracy_perclass = 0.0848861
I0824 10:27:48.350198 29098 solver.cpp:397]     Test net output #11: accuracy_perclass = 0.0489993
I0824 10:27:48.350205 29098 solver.cpp:397]     Test net output #12: accuracy_perclass = 0.0517598
I0824 10:27:48.350213 29098 solver.cpp:397]     Test net output #13: accuracy_perclass = 0.0883368
I0824 10:27:48.350219 29098 solver.cpp:397]     Test net output #14: accuracy_perclass = 0.0545204
I0824 10:27:48.350225 29098 solver.cpp:397]     Test net output #15: accuracy_perclass = 0.05245
I0824 10:27:48.350232 29098 solver.cpp:397]     Test net output #16: accuracy_perclass = 0.307798
I0824 10:27:48.350239 29098 solver.cpp:397]     Test net output #17: accuracy_perclass = 0.0586611
I0824 10:27:48.350245 29098 solver.cpp:397]     Test net output #18: accuracy_perclass = 0.0393375
I0824 10:27:48.350252 29098 solver.cpp:397]     Test net output #19: accuracy_perclass = 0.0621118
I0824 10:27:48.350260 29098 solver.cpp:397]     Test net output #20: accuracy_perclass = 0.057971
I0824 10:27:48.350265 29098 solver.cpp:397]     Test net output #21: accuracy_perclass = 0.0510697

Accuracy = 1 should imply each per-class accuracy of 1 - am I not right? In further iterations it drops to about 0.87 which is consistent with master. But this first one just doesn't make sense.

PS: @shelhamer will probably remind you about the chmod - you're setting 100755 again :P

@shaibagon
Copy link
Member Author

(1) Regarding the high accuracy at first iteration:
It might be the case that all predictions has same probability. This implementation checks top_k stricktly greater probabilities than this label, thus for same probabilities it will count as "accurate".
(2) Regarding accuracy=1 and per-class is much lower: that's odd. Does it happen in GPU? CPU? Both?
(3) file permissions :( I'll fix. again.

@Noiredd
Copy link
Member

Noiredd commented Aug 24, 2017

Took me a while to get down to the bottom of this; turned out it was a problem of my model. Due to bad initialization, it outputted all zeros on the first iteration - so your answer (1) explains it. My fault of course.

On the other hand though, if P out of N elements of the output probability vector have equal, high values, and the real answer is also among them, should that be treated as a hit even if P > top_k? Thinking from the point of view of a deployed model, such answer is not very informative.

@shaibagon
Copy link
Member Author

@Noiredd You are making a good point here: Accuracy layer should only count predictions that are strictly within top_k. If there are other predictions with equal probability, they should be counted as part of top_k.

I'll make the necessary changes

@Noiredd
Copy link
Member

Noiredd commented Aug 25, 2017

@shaibagon Just modifying the inequality in num_better_predictions increment from strict to >= does it. Note that we need to initialize num_better_predictions to -1, or in some other way account for the one element that will always be equal to prob_of_true_class.

@shaibagon
Copy link
Member Author

@Noiredd does that solves also the issue with Accuracy=1 while per-class accuracy is less than 1?

@Noiredd
Copy link
Member

Noiredd commented Aug 25, 2017

@shaibagon Yes. Accuracy=1 and per-class accuracy<1 only happened when the network produced an all-zero output. With those changes, both total and per-class accuracy will be 0.

…op_k, no need for fancy priority_queue etc. (2) GPU implementation
@shaibagon
Copy link
Member Author

@shelhamer @Noiredd
(1) Issue with many predictions with same probability fixed.
(2) File permissions fixed (again). Hopefully this time it will last.

Thank you for your efforts on improving this PR!

@Noiredd
Copy link
Member

Noiredd commented Aug 28, 2017

I have no further objections ;) Great contribution by @shaibagon!

@shaibagon shaibagon merged commit 62e0c85 into BVLC:master Oct 10, 2017
@shaibagon shaibagon deleted the upgrade_accuracy branch October 10, 2017 17:30
@vlomonaco
Copy link

vlomonaco commented Oct 15, 2017

Hi, I've just created a new issue (#5981) which may be related to this merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants