Skip to content

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented May 26, 2023

Description:
Adds a new Windows-exclusive metric called process.handles, which represents the handle count of the given process. When enabled, the receiver will make a WMI Query at the beginning of each scrape to update the handle count for all processes on the system. If the metric is enabled on a platform other than Windows, an error will be produced when attempting to refresh handle counts. This matches the rough behaviour of the Linux exclusive process.open_file_descriptors.

Link to tracking Issue:
#21379
#12482

Testing:
Ran the binary with the following configuration:

receivers:
  hostmetrics:
    collection_interval: 2s
    scrapers:
      cpu: {}
      disk: {}
      filesystem: {}
      load: {}
      memory: {}
      network: {}
      paging: {}
      process:
        mute_process_name_error: true
        metrics:
          process.handles:
            enabled: true
      processes: {}

exporters:
  file:
    path: x.json

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [file]

The following is an example result of a scrape with this configuration.
https://gist.github.com/braydonk/c97996272574319e03111dc79076a1bd

Documentation:
documentation.md was generated by mdatagen from the metadata.yaml updates.

@braydonk braydonk requested a review from a team May 26, 2023 15:38
@braydonk braydonk requested a review from dmitryax as a code owner May 26, 2023 15:38
@braydonk braydonk force-pushed the process-handles-metric branch 3 times, most recently from 17d2f55 to 2ad4bf5 Compare May 26, 2023 18:34
@braydonk braydonk changed the title hostmetricsreceiver: add process.handles metric [receiver/hostmetricsreceiver] add process.handles metric May 26, 2023
@braydonk braydonk force-pushed the process-handles-metric branch from 2ad4bf5 to 3f54a09 Compare May 26, 2023 18:57
@braydonk braydonk changed the title [receiver/hostmetricsreceiver] add process.handles metric [receiver/hostmetrics] add process.handles metric May 29, 2023
@dmitryax dmitryax added the Run Windows Enable running windows test on a PR label Jun 7, 2023
@dmitryax
Copy link
Member

dmitryax commented Jun 7, 2023

Please take a look at the failing CI

@braydonk braydonk force-pushed the process-handles-metric branch from 6f8f077 to a3ce3db Compare June 12, 2023 17:20
@braydonk
Copy link
Contributor Author

CI looks to be in a healthy state (one test failure but it's some kind of timeout test, we can run again before submission just to make sure).

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you also please submit an issue for the new metric name in semantic conventions and link it to open-telemetry/opentelemetry-specification#3556?

@dmitryax
Copy link
Member

Please rebase

@braydonk
Copy link
Contributor Author

Thanks for the review @dmitryax! I've rebased and will wait to ensure CI checks pass again.

I opened a semantic conventions PR and linked it to the working group issue. I've heard that these metrics are being moved to being generated from YAML instead of written directly in the markdown document, so I may delay that PR until that work is done and move it to a YAML definition.

braydonk added 4 commits June 27, 2023 19:46
Adds a new Windows-exclusive metric called process.handles, which
represents the handle count of the given process. When enabled, the
receiver will make an additional syscall, specifically
`NtQuerySystemInformation` at the beginning of each scrape
to update the handle count for all processes on the system. If the
metric is enabled on a platform other than Windows, an error will be
produced when attempting to refresh handle counts. This matches the
rough behaviour of the Linux exclusive `process.open_file_descriptors`.
I put the handle count refresh after all the metrics scrapes instead of
after collecting all process handles, which meant the first scrape would
always be empty.
NtQuerySystemInformation is an unsupported internal API. This means it
can be subject to change. This commit swaps handle count retrieval to
use Windows Management Instrumentation to get the counts instead.
Introduces a more generic interface for handle count querying than what
was there before. This makes it easier to reason about when using an
implementation with different syscalls, so the handleCountManager
implementation can remain the way it was. Fixed the test to reflect the
changes.
braydonk added 5 commits June 27, 2023 19:46
The argument needs to be there to implement the interface but it doesn't
need to be named because it's unused.
Changing the process.handles metric from a gauge to a cumulative
non-monotonic sum.
The `handles` package name actually clashed with a ton of other places
that had variables called `handles`. This commit renames the package to
`handlecount`.
@braydonk braydonk force-pushed the process-handles-metric branch from c2903c9 to 4eefda7 Compare