-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Rename feature of the metrics transform processor #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #347 +/- ##
==========================================
+ Coverage 83.48% 83.53% +0.04%
==========================================
Files 171 169 -2
Lines 9261 9037 -224
==========================================
- Hits 7732 7549 -183
+ Misses 1199 1161 -38
+ Partials 330 327 -3
Continue to review full report at Codecov.
|
Is this just for renaming metrics (not labels)? If so, seems similar to spanprocessor -- should this be called metricprocessor? Also, why is this in contrib? This belongs in core. |
Hi @flands we chatted a fair bit with @bogdandrutu and @jrcamp about that (mostly in an offline meeting, but some part of the conversation is captured in the associated issue, #332 We talked about the value of having the rename processor live in core, but in tension with that would be the value of being able to co-locate all the processing/renaming/aggregating for a single metric in a single place. Our concensus coming out of that was that initially this would all live in contrib and would be a single processor that allows for metric, label and label value renaming as well as basic aggregations (dropping labels, combining label values down to a single value). In the future there would be the potential to have a rename-only metric processor in core that could possibly share underlying code with the rename functionality of that processor. (As an aside, we have been using the Draft feature of PRs as a way to have preliminary review by us as intern hosts before sending it out to the community for official review - hope that's OK) |
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments from initial scan through. Overall structure looks good.
Note you'll also need to make the processor package a go module since its in Contrib.
processor/metricstransformprocessor/metrics_transform_processor_test.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor_test.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor_test.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor_test.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor_test.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
@asuresh4 Please let me know if the changes here look good. If so, I will go ahead and resolve the two conflicting files. I have resolved the conflicting files before, but when another PR is merged, these files will be conflict again. Therefore, I will wait for your approval on the rest of the changes this time, and only resolve these once before it merges. Thanks! :) |
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
} | ||
|
||
// validNewName determines if the new name is a valid one. An invalid one is one that already exists. | ||
func (mtp *metricsTransformProcessor) validNewName(transform Transform, nameToMetricMapping map[string]*metricspb.Metric) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this function would be more accurately called something like metricNameExists(...) bool
.
Also, I didn't catch before - what was the main justification for keeping this validation? I'm still 50/50 on whether its needed at all. While it would imply an error to have two metrics with the same name, I don't think there's any other code to validate that there aren't already duplicates in this list (which if there were, could result in unexpected behaviour in your code), and I'm not sure its worth paying the cost of allocating this extra map, having more complex code, etc.
It's likely an error will be thrown at export time if this situation occurs anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do agree with me about removing this validation, you can remove a huge amount of the code in the core functions of this file, i.e. the transform
function can be simplfied to:
for _, metric := range data.Metrics {
transform := nameToTransformMapping[metric.MetricDescriptor.Name]
if transform.Action == Insert {
metric = mtp.createCopy(metric)
mds[i].Metrics = append(metricPtrs, metricCopy)
}
mtp.update(metricCopy, transform)
}
The nameToTransformMapping
can be made once on initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this validNewName and other validation functions came out of discussions with Dave and Quentin. Basically, when these unexpected behaviors occur, not only does it appear in the output data, but there will be an error message logged, so that the errors can be more viewable to the users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also regarding simplifying the code this way, since we want the transformations to perform in the order from the list. I would think we need to iterate through the transforms instead of the metric, which then involves managing the metrics mapping as we go though transformations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea okay that's true. I was hoping to find a solution that would avoid having to allocate an extra map on each call to ConsumeMetrics
, but if we want to guarantee the order of operations that is probably not possible.
I am also slightly concerned that if we ever extend this processor to support filtering beyond just "name", then it won't make sense to have a simple map anymore, but we can figure out how to handle that later
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
processor/metricstransformprocessor/metrics_transform_processor.go
Outdated
Show resolved
Hide resolved
} | ||
// if name is updated, the map has to be updated | ||
nameToMetricMapping[transform.NewName] = nameToMetricMapping[transform.MetricName] | ||
delete(nameToMetricMapping, transform.MetricName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a bit of weird behaviour where the order of the actions may matter in terms of determining whether an error will occur. This is another reason to just remove this logic entirely imo (see comment on L182)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the order of transformations here should matter because when I discussed with Dave and Quentin, the transforms list should be an ordered action list that will perform the transforms one by one to the metrics. Do you think that this shouldn't be the case?
delete(nameToMetricMapping, transform.MetricName) | ||
} else if transform.Action == Insert { | ||
var newMetric *metricspb.Metric | ||
mds[i].Metrics, newMetric = mtp.insert(metric, mds[i].Metrics, transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can put mds[i]
into a variable so you don't need to dereference it on each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried to use data
directly, which is mds[i]
from the for loop, but this way, the original copy of the metric is not updated correctly since mds is a list of non-pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I guess you could set data := &mds[i]
instead
} | ||
|
||
// validNewLabel determines if the new label is a valid one. An invalid one is one that already exists. | ||
func (mtp *metricsTransformProcessor) validNewLabel(labelKeys []*metricspb.LabelKey, newLabel string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is relatively minimal and straightforward, but even still, similar to above, I think I'm a fan of doing little / no validation here and allowing this processor to be lean and fast. If the user screws up, they will export bad data, or more likely get an error at export time.
Happy to be convinced otherwise on this one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be for the same reasoning as the other validations. (This one is to guard against renaming a label to a name that already exists because this should be done through an aggregation) These are all to keep the boundaries of the expected behaviors, and once the operations step out of bounds, the processor is able to notice. However, after thinking about the single responsibility of the processor, I do see how these validations should be removed. Since the processors should only worry about transforming the data specified by the user even the user's specification might imply a possible error. I also don't see any other processors that validate data processing, so from a convention stand point, I also think removing the validations might be reasonable. However, this also means that users may have data transformed in ways that might result in bad output metric data. If this is ok for processors, then with your approval, I would love to remove the validations because this also makes the code a lot more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is lets start with not having it and keep the code very simple. We can add it in later if people want it. But I'm okay with leaving it there if you feel strongly about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! The current plan is removing the validations for the purpose of starting simple, but I will keep this code saved in another branch. If it is needed later, we will put this back in. If we later decide to add this back in, I will need to modify the logging process as well. Therefore, before I do any additional work on that, I want to make sure the work is needed.
This reverts commit 5bb23b1.
* Rename GetTracer to NewTracer * Drop New prefix
This PR adds a `checkfile` tool which validates the presence of a file for each component. It also deprecates `checkdoc`. Please see the related issue (open-telemetry#347) for additional details on the reason.
Description: Rename Functionality including renaming the metrics name and the labels
Link to tracking Issue: #332
Testing: unit tests, 100% coverage within module
Documentation: Detailed comments in code describing the responsibility of each function