Skip to content

Conversation

dmah42
Copy link
Member

@dmah42 dmah42 commented Sep 8, 2017

The only supported reporter in the library outputs JSON. Tooling has
been added to generate CSV and Tabular reports from the JSON output.

This change is incomplete and is an RFC. Known issues: lack of testing
for tooling, some documentation needs to be updated, no colour support
in tabular output tooling.

Enables #441, #190.

The only supported reporter in the library outputs JSON. Tooling has
been added to generate CSV and Tabular reports from the JSON output.

This change is incomplete and is an RFC. Known issues: lack of testing
for tooling, some documentation needs to be updated, no colour support
in tabular output tooling.

Enables #441, #190.
@dmah42 dmah42 self-assigned this Sep 8, 2017
@dmah42 dmah42 requested a review from EricWF September 8, 2017 17:37
@coveralls
Copy link

Coverage Status

Coverage decreased (-5.3%) to 81.765% when pulling e323aed on reportercleanup into 886585a on master.

@ldionne
Copy link
Contributor

ldionne commented Sep 10, 2017

I really, really want the ability to easily generate charts from GoogleBenchmarks output (that is how I found this PR in the first place). However, I don't see at all how removing native support for useful reporters is going to help that goal. I'd personally like to conserve the console output by default, as it is super useful. Am I missing something?

@dmah42
Copy link
Member Author

dmah42 commented Sep 11, 2017

@ldionne I expect that any chart generation (or other custom output) will work best from JSON, for starters, so making that the default makes sense. Once we have this, adding a python tool to generate charts from a series of JSON output is trivial (rather than adding it to the library).

Getting the tabular output is still simple:

$ tools/tabular_report.py <benchmark binary>

but with this change the library itself is simpler and this opens the door to much more functionality on the reporting side.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.3%) to 81.765% when pulling b7f0a93 on reportercleanup into 886585a on master.

@ldionne
Copy link
Contributor

ldionne commented Sep 11, 2017

We can already have JSON output, and so we should be able to generate charts from those right now (given scripts to generate charts from JSON). I just don't see what removing the default reporters we currently have will bring to the table, and I'll be very sad if I need to use an additional command-line tool just to translate that to tabular output.

@dmah42
Copy link
Member Author

dmah42 commented Sep 12, 2017

One thing it brings is a smaller library (370kb instead of 440kb for release builds on linux), but that's a minor benefit. The broader benefit is the requests we keep getting for custom reporting methods. Instead of having two classes of output, those in the library and those supported by external tools, everything is simpler both in terms of implementation and conceptually, if reporting in the library is done one way, and external tooling provides the more sophisticated support.

However, this is a user-facing change, so I'm definitely interested in your (@ldionne) and others' opinions.

One option is to keep the tabular and json reporters in the library, remove the CSV reporter, and then have tooling to support the csv, markdown, HTML use-cases. This would be less of a user-facing change, but would at least start us down the right path.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.3%) to 81.765% when pulling 1846e27 on reportercleanup into 886585a on master.

@LebedevRI
Copy link
Collaborator

While i totally understand the reasoning - less duplication, avoidance of addition of new reporters, less code - i'm not sure i like the idea of having to use tools just to get the partially readable tabular report...

One option is to keep the tabular and json reporters in the library, remove the CSV reporter, and then have tooling to support the

So I'd personally prefer to consider this middle-ground

@ldionne
Copy link
Contributor

ldionne commented Sep 12, 2017

everything is simpler both in terms of implementation and conceptually, if reporting in the library is done one way

I fully agree. On the other hand, not having the tabular format be the default is a huge usability regression for users IMO, and that should weight more than internal code clarity. My use case is registering benchmarks as CMake targets and running them as part of my unit tests. This includes on Travis CI. Having the tabular output is very convenient because I can see the result of my benchmarks without any post-processing, wherever I run the benchmarks. For anything fancier, of course, I'd use JSON and post-process it myself. I personally don't have a use case for CSV and I don't care about it.

Perhaps one way you could go is to have only the JSON reporter, but then implement the tabular reporting natively on top of JSON. This way, you may get some of the simplifications of having the library pretend it only supports JSON, while still being able to do tabular reports natively. I haven't looked at the code much, so perhaps this makes no sense.

@dmah42
Copy link
Member Author

dmah42 commented Sep 13, 2017

On the other hand, not having the tabular format be the default is a huge usability regression for users IMO, and that should weight more than internal code clarity.

Not having it at all would be a usability regression. I'm not removing it, just removing it from the library.

My use case is registering benchmarks as CMake targets and running them as part of my unit tests. This includes on Travis CI. Having the tabular output is very convenient because I can see the result of my benchmarks without any post-processing, wherever I run the benchmarks.

I think this is still possible, but you may need an add_custom_command or execute_process to run the wrapper tool. It's not as clean as running the binary directly, but I do think the reduced complexity of the native code might be worth it.

I'm looking into the compromise of having the tabular and JSON reporters, but not the totally custom reporters, instead supporting that through python tooling.

@ldionne
Copy link
Contributor

ldionne commented Sep 13, 2017

Not having it at all would be a usability regression. I'm not removing it, just removing it from the library.

Which is a usability regression just the same. For example, what if I don't have Python available on my system? It also means that GoogleBenchmark would somehow need to install those Python scripts somewhere when you install the library; you can't assume that we're using the library checked out and built from source.

I'm looking into the compromise of having the tabular and JSON reporters, but not the totally custom reporters, instead supporting that through python tooling.

The only reason why I'm okay with this is that I don't care about other reports ATM. However, I could easily see someone oppose to that with the same argument I use to oppose the removal of the console reporter, and that would be valid.

@pleroy
Copy link
Contributor

pleroy commented Nov 27, 2017

I am going to side with @ldionne and claim that this PR is a horrible idea. Depending on external tooling is pushing the burden of conversion on the user, and is making the library much less useful. Now I need to install Python on all the machines that run benchmarks, find a random external tool that does the conversion I need, install that, and keep all this mess running. Oh, and it gets worse when someone decides to write a conversion tool in Perl or Ruby or whatever. Also, I frankly don't quite understand why an external Python tool is going to be easier to maintain than a simple C++ reporter.

If this PR went in I would probably be forced to stop upgrading to newer versions of this library, which would annoy me greatly.

@EricWF
Copy link
Contributor

EricWF commented Nov 27, 2017

@pleroy I hear and understand your concerns. But the approach I'm exploring is different than that proposed in this PR.

Instead of moving toward external tooling, I propose moving away from it. Step one will be to introduce a fully featured JSON library, which will be used to communicate arbitrary data structures between the user and the library.

The library will continue to provide console/tabular reporting as well.

However, W.R.T. #488: The CSV reporter is just fundamentally incompatible with the structure of the benchmarking data. Which is too complex to easily emit and as a useful list of comma separated values.
It simply can't provide a quality tool, and so I think that it, in specific, still needs to be removed.

@pleroy
Copy link
Contributor

pleroy commented Nov 28, 2017

Ah, I misunderstood what you were doing.

I agree that CSV is not a great format, and probably not suitable for complex datasets. I could imagine using CSV to provide essentially the same data as the console/tabular reporting, but it's hard to justify the complexity, with all the quoting and text massaging that needs to happen.

I am a wee bit annoyed by the notion of using JSON as the "pivot" format: I'd have preferred to use protobuf (the alpha and omega of all things Google) for that purpose, as it avoids messy parsing/unparsing. If for instance someone wants to produce XML (an eminently reasonable thing to do on Windows) it seems rather awkward to first produce JSON and then reparse it to produce XML. My sense of good taste is a bit offended ;-)

@dmah42
Copy link
Member Author

dmah42 commented Dec 1, 2017

i'm going to close this RFC in favour of @EricWF's plan.

FTR, i'm happy to lose CSV and use something as an intermediate format. JSON is more open sourcey, protobuf more googley. i'm not sure there's much between them for this use-case.

@dmah42 dmah42 closed this Dec 1, 2017
@EricWF
Copy link
Contributor

EricWF commented Dec 13, 2017

I am a wee bit annoyed by the notion of using JSON as the "pivot" format: I'd have preferred to use protobuf (the alpha and omega of all things Google) for that purpose, as it avoids messy parsing/unparsing.

Three points. First, I'm proposing introducing a existing JSON library to handle all the parsing/unparsing of JSON; so we don't need to maintain that code ourselves. (In particular: https://github.com/nlohmann/json)

Second, by using the JSON library, we can make the JSON representation less of a pivot format, but instead as the primary format by which we encode all results and inputs. Both internally within the library, and via the external API's. User's wont be given JSON strings, but instead a JSON object representing the structured data.

Third, the real issues I'm trying to address is the limitations of strongly-typed input/output data types such as BenchmarkReporter::Run (which is also why I didn't consider protobuf). The problem with the Run type is that it's limited to reporting only what can be stored it its data members, which prevents users from supplying arbitrary output. Even internally our usage of Run is a mess. We often use the
same members to hold entirely different types of data depending on the report type we're generating (ie. Regular vs BigO). This makes the meaning of the data members very confusing, and prevents users
from meaningfully using them.

Therefore, I believe the general solution is to make the types passed to the reporters (and more generally, passed as input to the benchmarks), should allow supplying of "weakly typed" data. JSON is one such possibility. Other possibilities include using std::any or std::variant to help manage different or arbitrary report types, but this creates the problem of figuring out how to report them; especially in the case of entirely custom user data passed via std::any. For this reason I went down the path of using JSON to represent benchmark reports, since it essentially type-erases the report to a simple JSON object, but also provides a perfect object-to-text conversion that Google Benchmark doesn't have to supply.

If for instance someone wants to produce XML (an eminently reasonable thing to do on Windows) it seems rather awkward to first produce JSON and then reparse it to produce XML. My sense of good taste is a bit offended ;-)

The use case I'm proposing wouldn't be as awkward as this suggests. You wouldn't have to first produce JSON, since that's that's the primary format, there's nothing to produce it from. Second, you wouldn't really have to re-parse anything. A rough implementation of the case your suggesting would look like this:

void foo() {
  JSON result = RunBenchmark(...);
  MyXMLBuilder builder;
  for (JSON::iterator It = result.begin(); it != result.end(): ++it) {
    // Assuming all fields are primitive.
    builder.addField(it.key(), it.value().dump());
  }
}

@dmah42 dmah42 deleted the reportercleanup branch May 29, 2018 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants