Skip to content

Conversation

@adrianreber
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

The initial implementation of checkpointing in CRI-O was based on Podman and initially the default workflow would have been that the container is stopped after checkpointing. CRIU will just kill all processes in the container.

As the initial Kubernetes checkpointing feature is based on the Forensic Container Checkpointing KEP, the container keeps on running after checkpointing.

If the container keeps on running after checkpointing it can happen that the files in the container are changed once we put the into the checkpoint archive. This means that the files are different during restore than they were during checkpointing.

CRIU aborts restoring if the file size of any open file has changed as restoring will put the FD pointer at the same location as during checkpointing. If the file, however, is different this can lead to data loss or crashes.

To solve this, this commit pauses the container before checkpointing and unpauses it after the checkpoint archive has been written to disk.

As checkpointing only works with the OCI runtimes currently, we do not have to handle the pause during restore.

The OCI runtime (runc/crun) uses the cgroup freezer to pause the process. CRIU also uses the cgroup freezer to pause all processes in the container. CRIU does not change the state of the cgroup freezer if the cgroup is already frozen. So the cgroup was already always frozen during checkpointing. With this change the frozen time is now controlled by CRI-O and not CRIU, but we still do not have to handle it during restore.

(cherry picked from commit 3d45027)

Which issue(s) this PR fixes:

Fixes: OCPBUGS-55964

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

The initial implementation of checkpointing in CRI-O was based on Podman
and initially the default workflow would have been that the container is
stopped after checkpointing. CRIU will just kill all processes in the
container.

As the initial Kubernetes checkpointing feature is based on the Forensic
Container Checkpointing KEP, the container keeps on running after
checkpointing.

If the container keeps on running after checkpointing it can happen that
the files in the container are changed once we put the into the
checkpoint archive. This means that the files are different during
restore than they were during checkpointing.

CRIU aborts restoring if the file size of any open file has changed as
restoring will put the FD pointer at the same location as during
checkpointing. If the file, however, is different this can lead to data
loss or crashes.

To solve this, this commit pauses the container before checkpointing and
unpauses it after the checkpoint archive has been written to disk.

As checkpointing only works with the OCI runtimes currently, we do not
have to handle the pause during restore.

The OCI runtime (runc/crun) uses the cgroup freezer to pause the
process. CRIU also uses the cgroup freezer to pause all processes in the
container. CRIU does not change the state of the cgroup freezer if the
cgroup is already frozen. So the cgroup was already always frozen during
checkpointing. With this change the frozen time is now controlled by
CRI-O and not CRIU, but we still do not have to handle it during
restore.

Signed-off-by: Adrian Reber <[email protected]>
(cherry picked from commit 3d45027)
Signed-off-by: Adrian Reber <[email protected]>
@adrianreber adrianreber requested a review from mrunalp as a code owner November 24, 2025 16:13
@openshift-ci openshift-ci bot added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 24, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 24, 2025
@openshift-ci-robot
Copy link

@adrianreber: This pull request references Jira Issue OCPBUGS-55964, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-55964 to depend on a bug in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

The initial implementation of checkpointing in CRI-O was based on Podman and initially the default workflow would have been that the container is stopped after checkpointing. CRIU will just kill all processes in the container.

As the initial Kubernetes checkpointing feature is based on the Forensic Container Checkpointing KEP, the container keeps on running after checkpointing.

If the container keeps on running after checkpointing it can happen that the files in the container are changed once we put the into the checkpoint archive. This means that the files are different during restore than they were during checkpointing.

CRIU aborts restoring if the file size of any open file has changed as restoring will put the FD pointer at the same location as during checkpointing. If the file, however, is different this can lead to data loss or crashes.

To solve this, this commit pauses the container before checkpointing and unpauses it after the checkpoint archive has been written to disk.

As checkpointing only works with the OCI runtimes currently, we do not have to handle the pause during restore.

The OCI runtime (runc/crun) uses the cgroup freezer to pause the process. CRIU also uses the cgroup freezer to pause all processes in the container. CRIU does not change the state of the cgroup freezer if the cgroup is already frozen. So the cgroup was already always frozen during checkpointing. With this change the frozen time is now controlled by CRI-O and not CRIU, but we still do not have to handle it during restore.

(cherry picked from commit 3d45027)

Which issue(s) this PR fixes:

Fixes: OCPBUGS-55964

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels Nov 24, 2025
@openshift-ci openshift-ci bot requested review from QiWang19 and klihub November 24, 2025 16:13
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianreber
Once this PR has been reviewed and has the lgtm label, please assign umohnani8 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

@adrianreber: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp 704efe7 link true /test e2e-gcp
ci/prow/images 704efe7 link true /test images
ci/prow/e2e-agnostic 704efe7 link true /test e2e-agnostic

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants