-
Notifications
You must be signed in to change notification settings - Fork 1.1k
OCPBUGS-28981: Add exponential backoff to the container stop loop #7854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-28981: Add exponential backoff to the container stop loop #7854
Conversation
|
/assign kwilczynski |
311afbe to
216bb63
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7854 +/- ##
==========================================
- Coverage 48.95% 48.94% -0.01%
==========================================
Files 151 151
Lines 16390 16426 +36
==========================================
+ Hits 8024 8040 +16
- Misses 7394 7411 +17
- Partials 972 975 +3 |
216bb63 to
35c0693
Compare
35c0693 to
0bf9852
Compare
0fa7d34 to
57117c1
Compare
|
@kwilczynski: This pull request references Jira Issue OCPBUGS-28981, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@kwilczynski: This pull request references Jira Issue OCPBUGS-28981, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@kwilczynski: This pull request references Jira Issue OCPBUGS-28981, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
50bda88 to
2797cc1
Compare
|
@haircommander, added the |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, kwilczynski, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
/retest-required |
Signed-off-by: Krzysztof Wilczyński <[email protected]>
2797cc1 to
b8e947a
Compare
|
/retest-required |
|
/retest |
|
/retest-required |
|
/override ci/prow/e2e-gcp-ovn |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp-ovn In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: Jira Issue OCPBUGS-28981: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-28981 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@kwilczynski: #7854 failed to apply on top of branch "release-1.28": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: #7854 failed to apply on top of branch "release-1.27": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: #7854 failed to apply on top of branch "release-1.26": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: #7854 failed to apply on top of branch "release-1.29": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
OK. Requires manual backport. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently, when CRI-O attempts to stop a container where the process within, especially an init process (the so-called "PID 1"), is in an uninterruptible blocking state (for example, it's sleeping and waiting for a disk I/O completion, etc.), CRI-O will enter a broken state where it tries to delivery termination signals to such a process as fast as possible.
Nonetheless, a blocked process might not promptly respond to the signals delivered, causing CRI-O to enter a "busy loop" while it repeatedly tries to signal delivery. This seemingly unbound loop can render CRI-O unresponsive and result in high CPU usage while it happens.
Thus, add exponential backoff support to the container stop loop to fix the possible busy loop issue irregardless of the current state of the process to be terminated. The exponential backoff will stagger termination signals delivery for as long as the process is still running, allowing it to eventually terminate on its own volition (or crash, whichever comes first).
Related to:
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
None
Does this PR introduce a user-facing change?