[Tooling] Enable U Test by default, add tooltip about repetition count. #617

LebedevRI · 2018-06-16T20:28:54Z

[Tooling] Enable U Test by default, add tooltip about repetition count.

As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:

$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80

(old screenshot)

Or, in the good case (noise omitted):

s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0

(old screenshot)

coveralls · 2018-06-16T20:43:08Z

Coverage decreased (-0.01%) to 87.157% when pulling 23df057 on LebedevRI:tools into 151ead6 on google:master.

AppVeyorBot · 2018-06-16T21:30:21Z

✅ Build benchmark 1287 completed (commit a8263d98e1 by @LebedevRI)

dmah42 · 2018-06-18T09:29:21Z

tools/gbench/report.py

+            if ((len(timings_time[0]) >= UTEST_MIN_REPETITIONS) and
+                (len(timings_time[1]) >= UTEST_MIN_REPETITIONS) and
+                (len(timings_cpu[0]) >= UTEST_MIN_REPETITIONS) and
+                    (len(timings_cpu[1]) >= UTEST_MIN_REPETITIONS)):


weird indent here.. is this correct?

This is correct, but the indent is indeed wrong.

Funny, autopep8 actually insists on this weird indent.

as long as it does the right thing :)

dmah42 · 2018-06-18T09:30:11Z

tools/gbench/report.py

            ['BM_Two', '+0.1111', '-0.0111', '9', '10', '90', '89'],
            ['BM_Two', '+0.2500', '+0.1125', '8', '10', '80', '89'],
-            ['U-test', '(p-value)', '0.2207', '0.6831'],
+            ['U-test', '(p-value)', '0.2207', '0.6831', 'Repetitions:', '2.', 'WARNING:', 'Results', 'unreliable!', '9+', 'repetitions', 'recommended.'],


a comment on output: should it have the bm name in the 'u-test' bit so people running the output through grep/ag/whatever will catch the u-test for the bm they want?

Ah, good idea..
How do you want it spelled, _utest_pvalue ?

or just BM_Two_p_value ?

I wanted to specify for which test the value is.
But i suppose i can do that next to the Repetitions: .

ah, you expect we might have more than one test. that's fair. BM_Two_utest_pvalue then?

But now that i have actually looked, having such a long suffix as _utest_pvalue, causes
the need to realign the columns (since the first column may now be wider).
So _utest_pvalue is better, but more troublesome.

So i have thought, and ok, let's just use _pvalue.
Avoids the headache with column width readjusting for now.

Pushed the _pvalue variant.

As previously discussed, let's flip the switch ^^. This exposes the problem that it will now be run for everyone, even if one did not read the help about the recommended repetition count. This is not good. So i think we can do the smart thing: ``` $ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------- BM_One -0.1000 +0.1000 10 9 100 110 BM_Two +0.1111 -0.0111 9 10 90 89 BM_Two +0.2500 +0.1125 8 10 80 89 BM_Two_pvalue 0.2207 0.6831 U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended. BM_Two_stat +0.0000 +0.0000 8 8 80 80 ``` (old screenshot) ![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png) Or, in the good case (noise omitted): ``` s$ ./compare.py benchmarks /tmp/run{0,1}.json Comparing /tmp/run0.json to /tmp/run1.json Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------- <99 more rows like this> ./_T012014.RW2/threads:8/real_time +0.0160 +0.0596 46 47 10 10 ./_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 100 ./_T012014.RW2/threads:8/real_time_mean +0.0094 +0.0609 46 47 10 10 ./_T012014.RW2/threads:8/real_time_median +0.0104 +0.0613 46 46 10 10 ./_T012014.RW2/threads:8/real_time_stddev -0.1160 -0.1807 1 1 0 0 ``` (old screenshot) ![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)

LebedevRI · 2018-06-18T12:03:37Z

@dominichamon thank you!

…t. (google#617) As previously discussed, let's flip the switch ^^. This exposes the problem that it will now be run for everyone, even if one did not read the help about the recommended repetition count. This is not good. So i think we can do the smart thing: ``` $ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------- BM_One -0.1000 +0.1000 10 9 100 110 BM_Two +0.1111 -0.0111 9 10 90 89 BM_Two +0.2500 +0.1125 8 10 80 89 BM_Two_pvalue 0.2207 0.6831 U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended. BM_Two_stat +0.0000 +0.0000 8 8 80 80 ``` (old screenshot) ![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png) Or, in the good case (noise omitted): ``` s$ ./compare.py benchmarks /tmp/run{0,1}.json Comparing /tmp/run0.json to /tmp/run1.json Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------- <99 more rows like this> ./_T012014.RW2/threads:8/real_time +0.0160 +0.0596 46 47 10 10 ./_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 100 ./_T012014.RW2/threads:8/real_time_mean +0.0094 +0.0609 46 47 10 10 ./_T012014.RW2/threads:8/real_time_median +0.0104 +0.0613 46 46 10 10 ./_T012014.RW2/threads:8/real_time_stddev -0.1160 -0.1807 1 1 0 0 ``` (old screenshot) ![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)

googlebot added the cla: yes label Jun 16, 2018

dmah42 reviewed Jun 18, 2018

View reviewed changes

LebedevRI force-pushed the tools branch from 157af34 to abd256b Compare June 18, 2018 11:21

LebedevRI force-pushed the tools branch from abd256b to 23df057 Compare June 18, 2018 11:22

dmah42 merged commit 7d03f2d into google:master Jun 18, 2018

LebedevRI deleted the tools branch June 18, 2018 12:03

[Tooling] Enable U Test by default, add tooltip about repetition count. #617

[Tooling] Enable U Test by default, add tooltip about repetition count. #617

Uh oh!

Conversation

LebedevRI commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AppVeyorBot commented Jun 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LebedevRI Jun 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LebedevRI commented Jun 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LebedevRI commented Jun 16, 2018 •

edited

Loading

coveralls commented Jun 16, 2018 •

edited

Loading

LebedevRI Jun 18, 2018 •

edited

Loading