Skip to content

Conversation

pmatyjasek-sumo
Copy link
Contributor

Description:
Add ability to associate metadata tags in k8s processor using pod UID rather than just IP

Link to tracking Issue: #1146

Testing:
Unit tests

Documentation:
In-code comments

@pmatyjasek-sumo pmatyjasek-sumo requested a review from a team January 27, 2021 14:23
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 27, 2021

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ Patryk Matyjasek (d90e3af2238249273579227b6f9c46a52ee15ae9)

Base automatically changed from master to main January 28, 2021 00:57
@pmatyjasek-sumo pmatyjasek-sumo force-pushed the pm_k8s_pod_uid_labels_impl branch from d90e3af to 44c8f1e Compare January 28, 2021 12:45
@codecov
Copy link

codecov bot commented Jan 28, 2021

Codecov Report

Merging #2199 (9721178) into main (63b2f33) will increase coverage by 0.00%.
The diff coverage is 99.06%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2199   +/-   ##
=======================================
  Coverage   90.45%   90.46%           
=======================================
  Files         397      397           
  Lines       19563    19614   +51     
=======================================
+ Hits        17695    17743   +48     
- Misses       1406     1408    +2     
- Partials      462      463    +1     
Flag Coverage Δ
integration 69.37% <ø> (+0.06%) ⬆️
unit 89.25% <99.06%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...cessor/k8sprocessor/observability/observability.go 91.66% <50.00%> (-8.34%) ⬇️
processor/k8sprocessor/factory.go 97.36% <100.00%> (+0.03%) ⬆️
processor/k8sprocessor/kube/client.go 100.00% <100.00%> (ø)
processor/k8sprocessor/options.go 100.00% <100.00%> (ø)
processor/k8sprocessor/pod_association.go 100.00% <100.00%> (ø)
processor/k8sprocessor/processor.go 100.00% <100.00%> (ø)
receiver/k8sclusterreceiver/watcher.go 95.29% <0.00%> (-2.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63b2f33...9721178. Read the comment docs.

@pmatyjasek-sumo pmatyjasek-sumo force-pushed the pm_k8s_pod_uid_labels_impl branch from 44c8f1e to 9725609 Compare January 28, 2021 13:07
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice!

return
}
}
c.Pods[pod.Status.PodIP] = newPod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right that the same pod will exist twice in the list, once keyed by the IP and once by the ID?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually discussed this with @pmatyjasek-sumo while he was working on the extension :) Having a single map and two references to the pod just makes the things easier, since this are different domains that have no risk of conflicts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling as @pmm-sumo mentioned we had some discussion on that and we decided that it should be ok if 2 keys from map will point to the same reference. With this we avoid to use two loops to iterate through two lists of Pods (one by IP and second by UID)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine, but you might want to add the current size of this map in a metric somewhere: this is the kind of data that can easily explode in terms of number of elements. It's not unusual to have tens of thousands of pods in a given cluster, so, having this info would help people running the processor to identify any possible memory issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this done? If you don't to do it in this PR, create an issue to track this.

pmatyjasek-sumo and others added 2 commits January 29, 2021 11:18
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
- Add more tests for added code
- Refactor Association table
- Remove unnecessary annonymous function
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 6, 2021
@jpkrohling
Copy link
Member

What's the current state of this PR? Is this ready to be reviews again?

@pmatyjasek-sumo
Copy link
Contributor Author

@jpkrohling yes it is ready for review. I've addressed previous comments in latest commit.

@github-actions github-actions bot removed the Stale label Feb 9, 2021
@jpkrohling
Copy link
Member

I have only one concern about the size the map might get, as it grows twice the rate of the number of pods being used. I believe the size of this map should be part of the set of metrics for the collector, as it would help correlate a possible increase in memory usage with the number of items in this map.

@pmatyjasek-sumo
Copy link
Contributor Author

@jpkrohling that is a good idea, I'll put this metric soon

otelsvc/k8s/pod_table_size

Signed-off-by: Patryk Matyjasek <[email protected]>
@pmatyjasek-sumo
Copy link
Contributor Author

I've updated PR

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor comment about the metrics. If you don't want to work on them here, let me know and I'll approve the PR once the tracking issue is created.

mPodsUpdated = stats.Int64("otelsvc/k8s/pod_updated", "Number of pod update events received", "1")
mPodsAdded = stats.Int64("otelsvc/k8s/pod_added", "Number of pod add events received", "1")
mPodsDeleted = stats.Int64("otelsvc/k8s/pod_deleted", "Number of pod delete events received", "1")
mPodTableSize = stats.Int64("otelsvc/k8s/pod_table_size", "Size of table containing pod info", "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but is there a list of those metrics somewhere, telling users of this processor what they actually mean? Also, it would be a good opportunity to:

  • s/otelsvc/otelcol
  • replace the slashes by underscores, as it seems to be the standard at other metrics

If you don't want to do them here, feel free to open a tracking issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling I've created an issue and also I'll start work on it #2322

@pmm-sumo
Copy link
Contributor

@tigrannajaryan this is the PR we discussed during Log SIG yesterday. It will allow us to tag logs with K8s metadata

Copy link
Contributor

@owais owais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of minor comments.

My understanding is that if a user does not update config to specify the new pod association rules, the processor will continue to work with connection/attribute IP, right?

id: string(pod.UID),
name: pod.Name,
ts: time.Now(),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks very repetitive. could be extracted into a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch


// If IP is still not available by this point, nothing can be tagged here. Return.
if podIP == "" {
podAttributes := k8sPodAssociationFromAttributes(ctx, resource.Attributes(), kp.podAssociations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have another attrsToAdd := kp.getAttributesForPod() below and this one is named podAttributes as well. This could be a bit confusing to follow. IIRC, podAttributes here is actually a map of pod identifiers, such as IP, UID, etc, right? If so, let's name it podIdentifiers or something similar.

@pmatyjasek-sumo
Copy link
Contributor Author

LGTM with a couple of minor comments.

My understanding is that if a user does not update config to specify the new pod association rules, the processor will continue to work with connection/attribute IP, right?

Yes it will work as previous

// If a match is found, the cached metadata is added to the data as resource attributes.
// extracted metadata to the relevant spans, metrics and logs. The processor uses the kubernetes API to discover all pods
// running in a cluster, keeps a record of their IP addresses, pod UIDs and interesting metadata.
// The rules for associating the source record with specific Pod Metadata are configured via "pod_association" key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"source record" is not an entirely clear term. Does this refer to the telemetry in pdata that is passing through the processor?

// It represents a list of rules that are executed in order until the first one is able to do the match.
// Each rule is specified as a pair of from (representing the rule type) and name (representing the extracted key name).
// Following rule types are available:
// from: "labels" - takes the attribute from resource attributes (the value can contain either IP address or Pod UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "resource" here refer to pdata.Resource or to the pod? Is this pdata.Resource attributes or pod labels? I am confused by mixed usage of "labels" and "attributes" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll clarify that