-
Notifications
You must be signed in to change notification settings - Fork 1.1k
OCPNODE-3316: Track conmon process #9205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test all |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #9205 +/- ##
==========================================
+ Coverage 66.55% 66.99% +0.44%
==========================================
Files 198 198
Lines 27164 27298 +134
==========================================
+ Hits 18078 18288 +210
+ Misses 7600 7505 -95
- Partials 1486 1505 +19 🚀 New features to boost your workflow:
|
3bc2b2b to
eb23ec3
Compare
Signed-off-by: Ayato Tokubi <[email protected]>
|
/skip |
|
/retest-required |
Signed-off-by: Ayato Tokubi <[email protected]>
|
@cri-o/cri-o-maintainers PTAL |
|
@bitoku: This pull request references OCPNODE-3316 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@cri-o/cri-o-maintainers Can you PTAL? |
1 similar comment
|
@cri-o/cri-o-maintainers Can you PTAL? |
Signed-off-by: Ayato Tokubi <[email protected]>
| return err | ||
| } | ||
|
|
||
| c.state.ContainerMonitorProcess, err = r.getConmonProcess(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be done in SetMonitorProcess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it intentionally to make SetMonitorProcess reusable with conmon-rs.
| return c.Living() == nil | ||
| } | ||
|
|
||
| func (r *runtimePod) ProbeMonitor(ctx context.Context, c *Container) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do save the PID of the monitor by asking it via RPC, we could do a similar thing as runtime oci here if we set the monitor pid to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I'd prefer to do it in a separate PR and keep this PR small. (it's not small already though)
|
M |
|
|
||
| if r.IsContainerAlive(c) { | ||
| metrics.Instance().MetricContainersStoppedMonitorCountInc(c.Name()) | ||
| log.Errorf(ctx, "Conmon for container %s is stopped, although the container is running", c.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we restart the conmon process or stop the container ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can restart conmon.
I'd rather not stop the container, because I think users want to decide what to do to the orphaned containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add one more check to see if the container pid of pidfile is alive or not, if the pid of pidfile is dead, should remove the container,the pidfile lies in the following dir
[root@node userdata]# pwd
/var/run/containers/storage/overlay-containers/abb2086e148c22259006ba00dcc360b953edf1277fcbc43d62588d936de0940b/userdata
[root@node userdata]# ls
attach config.json conmon-pidfile ctl pidfile run winsz
I have once encounted into this situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks but IsContainerAlive already checks container liveness with container pid. I don't understand why we need to use pidfile.
Signed-off-by: Ayato Tokubi <[email protected]>
|
@cri-o/cri-o-maintainers PTAL? |
|
/override ci/prow/ci-e2e-evented-pleg LGTM, good work here! @cri-o/cri-o-maintainers PTAL |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-e2e-evented-pleg In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bitoku, haircommander, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds feature to track conmon and logs error and emits a metric when the conmon is stopped.
Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/OCPNODE-1853
Special notes for your reviewer:
benchmark result is here.
Does this PR introduce a user-facing change?