-
Notifications
You must be signed in to change notification settings - Fork 1.7k
RFC: Replace reporters with JSON and tooling. #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The only supported reporter in the library outputs JSON. Tooling has been added to generate CSV and Tabular reports from the JSON output. This change is incomplete and is an RFC. Known issues: lack of testing for tooling, some documentation needs to be updated, no colour support in tabular output tooling. Enables #441, #190.
I really, really want the ability to easily generate charts from GoogleBenchmarks output (that is how I found this PR in the first place). However, I don't see at all how removing native support for useful reporters is going to help that goal. I'd personally like to conserve the console output by default, as it is super useful. Am I missing something? |
@ldionne I expect that any chart generation (or other custom output) will work best from JSON, for starters, so making that the default makes sense. Once we have this, adding a python tool to generate charts from a series of JSON output is trivial (rather than adding it to the library). Getting the tabular output is still simple:
but with this change the library itself is simpler and this opens the door to much more functionality on the reporting side. |
We can already have JSON output, and so we should be able to generate charts from those right now (given scripts to generate charts from JSON). I just don't see what removing the default reporters we currently have will bring to the table, and I'll be very sad if I need to use an additional command-line tool just to translate that to tabular output. |
One thing it brings is a smaller library (370kb instead of 440kb for release builds on linux), but that's a minor benefit. The broader benefit is the requests we keep getting for custom reporting methods. Instead of having two classes of output, those in the library and those supported by external tools, everything is simpler both in terms of implementation and conceptually, if reporting in the library is done one way, and external tooling provides the more sophisticated support. However, this is a user-facing change, so I'm definitely interested in your (@ldionne) and others' opinions. One option is to keep the tabular and json reporters in the library, remove the CSV reporter, and then have tooling to support the csv, markdown, HTML use-cases. This would be less of a user-facing change, but would at least start us down the right path. |
While i totally understand the reasoning - less duplication, avoidance of addition of new reporters, less code - i'm not sure i like the idea of having to use tools just to get the partially readable tabular report...
So I'd personally prefer to consider this middle-ground |
I fully agree. On the other hand, not having the tabular format be the default is a huge usability regression for users IMO, and that should weight more than internal code clarity. My use case is registering benchmarks as CMake targets and running them as part of my unit tests. This includes on Travis CI. Having the tabular output is very convenient because I can see the result of my benchmarks without any post-processing, wherever I run the benchmarks. For anything fancier, of course, I'd use JSON and post-process it myself. I personally don't have a use case for CSV and I don't care about it. Perhaps one way you could go is to have only the JSON reporter, but then implement the tabular reporting natively on top of JSON. This way, you may get some of the simplifications of having the library pretend it only supports JSON, while still being able to do tabular reports natively. I haven't looked at the code much, so perhaps this makes no sense. |
Not having it at all would be a usability regression. I'm not removing it, just removing it from the library.
I think this is still possible, but you may need an I'm looking into the compromise of having the tabular and JSON reporters, but not the totally custom reporters, instead supporting that through python tooling. |
Which is a usability regression just the same. For example, what if I don't have Python available on my system? It also means that GoogleBenchmark would somehow need to install those Python scripts somewhere when you install the library; you can't assume that we're using the library checked out and built from source.
The only reason why I'm okay with this is that I don't care about other reports ATM. However, I could easily see someone oppose to that with the same argument I use to oppose the removal of the console reporter, and that would be valid. |
I am going to side with @ldionne and claim that this PR is a horrible idea. Depending on external tooling is pushing the burden of conversion on the user, and is making the library much less useful. Now I need to install Python on all the machines that run benchmarks, find a random external tool that does the conversion I need, install that, and keep all this mess running. Oh, and it gets worse when someone decides to write a conversion tool in Perl or Ruby or whatever. Also, I frankly don't quite understand why an external Python tool is going to be easier to maintain than a simple C++ reporter. If this PR went in I would probably be forced to stop upgrading to newer versions of this library, which would annoy me greatly. |
@pleroy I hear and understand your concerns. But the approach I'm exploring is different than that proposed in this PR. Instead of moving toward external tooling, I propose moving away from it. Step one will be to introduce a fully featured JSON library, which will be used to communicate arbitrary data structures between the user and the library. The library will continue to provide console/tabular reporting as well. However, W.R.T. #488: The CSV reporter is just fundamentally incompatible with the structure of the benchmarking data. Which is too complex to easily emit and as a useful list of comma separated values. |
Ah, I misunderstood what you were doing. I agree that CSV is not a great format, and probably not suitable for complex datasets. I could imagine using CSV to provide essentially the same data as the console/tabular reporting, but it's hard to justify the complexity, with all the quoting and text massaging that needs to happen. I am a wee bit annoyed by the notion of using JSON as the "pivot" format: I'd have preferred to use protobuf (the alpha and omega of all things Google) for that purpose, as it avoids messy parsing/unparsing. If for instance someone wants to produce XML (an eminently reasonable thing to do on Windows) it seems rather awkward to first produce JSON and then reparse it to produce XML. My sense of good taste is a bit offended ;-) |
i'm going to close this RFC in favour of @EricWF's plan. FTR, i'm happy to lose CSV and use something as an intermediate format. JSON is more open sourcey, protobuf more googley. i'm not sure there's much between them for this use-case. |
I am a wee bit annoyed by the notion of using JSON as the "pivot" format: I'd have preferred to use protobuf (the alpha and omega of all things Google) for that purpose, as it avoids messy parsing/unparsing. Three points. First, I'm proposing introducing a existing JSON library to handle all the parsing/unparsing of JSON; so we don't need to maintain that code ourselves. (In particular: https://github.com/nlohmann/json) Second, by using the JSON library, we can make the JSON representation less of a pivot format, but instead as the primary format by which we encode all results and inputs. Both internally within the library, and via the external API's. User's wont be given JSON strings, but instead a JSON object representing the structured data. Third, the real issues I'm trying to address is the limitations of strongly-typed input/output data types such as Therefore, I believe the general solution is to make the types passed to the reporters (and more generally, passed as input to the benchmarks), should allow supplying of "weakly typed" data. JSON is one such possibility. Other possibilities include using
The use case I'm proposing wouldn't be as awkward as this suggests. You wouldn't have to first produce JSON, since that's that's the primary format, there's nothing to produce it from. Second, you wouldn't really have to re-parse anything. A rough implementation of the case your suggesting would look like this: void foo() {
JSON result = RunBenchmark(...);
MyXMLBuilder builder;
for (JSON::iterator It = result.begin(); it != result.end(): ++it) {
// Assuming all fields are primitive.
builder.addField(it.key(), it.value().dump());
}
} |
The only supported reporter in the library outputs JSON. Tooling has
been added to generate CSV and Tabular reports from the JSON output.
This change is incomplete and is an RFC. Known issues: lack of testing
for tooling, some documentation needs to be updated, no colour support
in tabular output tooling.
Enables #441, #190.