Skip to content

Conversation

LebedevRI
Copy link
Collaborator

@LebedevRI LebedevRI commented Jun 16, 2018

[Tooling] Enable U Test by default, add tooltip about repetition count.

As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:

$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80

(old screenshot)
image

Or, in the good case (noise omitted):

s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0

(old screenshot)
image

@coveralls
Copy link

coveralls commented Jun 16, 2018

Coverage Status

Coverage decreased (-0.01%) to 87.157% when pulling 23df057 on LebedevRI:tools into 151ead6 on google:master.

@AppVeyorBot
Copy link

Build benchmark 1287 completed (commit a8263d98e1 by @LebedevRI)

if ((len(timings_time[0]) >= UTEST_MIN_REPETITIONS) and
(len(timings_time[1]) >= UTEST_MIN_REPETITIONS) and
(len(timings_cpu[0]) >= UTEST_MIN_REPETITIONS) and
(len(timings_cpu[1]) >= UTEST_MIN_REPETITIONS)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird indent here.. is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but the indent is indeed wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, autopep8 actually insists on this weird indent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as it does the right thing :)

['BM_Two', '+0.1111', '-0.0111', '9', '10', '90', '89'],
['BM_Two', '+0.2500', '+0.1125', '8', '10', '80', '89'],
['U-test', '(p-value)', '0.2207', '0.6831'],
['U-test', '(p-value)', '0.2207', '0.6831', 'Repetitions:', '2.', 'WARNING:', 'Results', 'unreliable!', '9+', 'repetitions', 'recommended.'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment on output: should it have the bm name in the 'u-test' bit so people running the output through grep/ag/whatever will catch the u-test for the bm they want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good idea..
How do you want it spelled, _utest_pvalue ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just BM_Two_p_value ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to specify for which test the value is.
But i suppose i can do that next to the Repetitions: .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you expect we might have more than one test. that's fair. BM_Two_utest_pvalue then?

Copy link
Collaborator Author

@LebedevRI LebedevRI Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now that i have actually looked, having such a long suffix as _utest_pvalue, causes
the need to realign the columns (since the first column may now be wider).
So _utest_pvalue is better, but more troublesome.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i have thought, and ok, let's just use _pvalue.
Avoids the headache with column width readjusting for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the _pvalue variant.

As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:
```
$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png)

Or, in the good case (noise omitted):
```
s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)
@dmah42 dmah42 merged commit 7d03f2d into google:master Jun 18, 2018
@LebedevRI LebedevRI deleted the tools branch June 18, 2018 12:03
@LebedevRI
Copy link
Collaborator Author

@dominichamon thank you!

JBakamovic pushed a commit to JBakamovic/benchmark that referenced this pull request Dec 6, 2018
…t. (google#617)

As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:
```
$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png)

Or, in the good case (noise omitted):
```
s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants