Skip to content

Conversation

@saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Nov 28, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adding support for the systemd watchdog. Right now it verifies that the CRI socket is reachable and the runtime reports ready status.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

References:

Does this PR introduce a user-facing change?

Adding support for the systemd watchdog. For now it verifies that the CRI socket is reachable and the runtime reports ready status.

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 28, 2024
@openshift-ci openshift-ci bot requested review from klihub and kwilczynski November 28, 2024 13:25
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024
@saschagrunert saschagrunert force-pushed the watchdog branch 2 times, most recently from a8e1406 to 824e11f Compare November 28, 2024 13:28
@codecov
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 31 lines in your changes missing coverage. Please review.

Project coverage is 46.98%. Comparing base (a0788cc) to head (33dbcc1).
Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8791      +/-   ##
==========================================
+ Coverage   46.90%   46.98%   +0.08%     
==========================================
  Files         150      154       +4     
  Lines       21869    21962      +93     
==========================================
+ Hits        10257    10319      +62     
- Misses      10553    10583      +30     
- Partials     1059     1060       +1     

@saschagrunert
Copy link
Member Author

/retest

5 similar comments
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

/retest

@saschagrunert saschagrunert force-pushed the watchdog branch 2 times, most recently from 18c75c0 to 7e3865c Compare November 29, 2024 10:14
@saschagrunert saschagrunert changed the title Add systemd watchdog support WIP: Add systemd watchdog support Nov 29, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2024
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert saschagrunert changed the title WIP: Add systemd watchdog support Add systemd watchdog support Nov 29, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2024
@saschagrunert
Copy link
Member Author

Not exactly sure why the CI fails here.

@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

/test images

@saschagrunert saschagrunert changed the title Add systemd watchdog support WIP: Add systemd watchdog support Nov 29, 2024
@saschagrunert saschagrunert force-pushed the watchdog branch 5 times, most recently from a526313 to 24067c9 Compare December 2, 2024 09:15
@saschagrunert saschagrunert force-pushed the watchdog branch 2 times, most recently from a98381c to 6524abf Compare December 2, 2024 09:55
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert saschagrunert force-pushed the watchdog branch 2 times, most recently from dc8cefa to b2bf851 Compare December 2, 2024 10:44
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert saschagrunert changed the title WIP: Add systemd watchdog support Add systemd watchdog support Dec 2, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2024
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

@cri-o/cri-o-maintainers PTAL

@haircommander
Copy link
Member

code LGTM, this is a neat feature! One concern I have is in the case cri-o is acting poorly because of CPU throttling, it feels to me having all this code then having systemd restart it could make the problem worse. What kinds of failures is this intended on addressing? a deadlock is one, but I'm not sure we'd get stuck when reporting status

@saschagrunert
Copy link
Member Author

What kinds of failures is this intended on addressing? a deadlock is one, but I'm not sure we'd get stuck when reporting status

Right now we just check everything around the CRI-O socket: being able to connect, transfer and analyze the data. CRI-O now assumes that it doesn't work correctly if the socket or the transferred data is misbehaving. I would say we could also think about more use cases to follow-up, but I don't have anything specific in mind.

We can also make it configurable if y'all want! :)

Signed-off-by: Sascha Grunert <[email protected]>
@saschagrunert
Copy link
Member Author

We can also make it configurable if y'all want! :)

I mean, it is already configurable by using or omitting WatchdogSec=60s in the service spec.

I rebased the PR on top of the latest fixes, PTAL again @cri-o/cri-o-maintainers

@saschagrunert
Copy link
Member Author

/test ci-fedora-kata

@saschagrunert saschagrunert added this to the 1.32 milestone Dec 9, 2024
@haircommander
Copy link
Member

/lgtm

neat!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2024
@saschagrunert
Copy link
Member Author

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 8d332ee into cri-o:main Dec 11, 2024
70 checks passed
@saschagrunert saschagrunert deleted the watchdog branch December 11, 2024 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants