Skip to content

Conversation

@rtreffer
Copy link
Contributor

@rtreffer rtreffer commented Jul 9, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

On log rotate or container exit the logs are closed and fsync is used to flush the data to disk.
This results in good crash recovery behavior but it may cause IO locks that can cause pods to wait on logging to stdout/stderr. #6743 describes the issue.

conmon supports --no-sync-log to avoid this issue. This PR adds the same (pass through) no_sync_log option to cri-o.

Related:

Which issue(s) this PR fixes:

Fixes: #6743

Special notes for your reviewer:

This is my first PR to CRI-O. I have been able to test this locally (manually), conmon starts up with --no-sync-log.

Does this PR introduce a user-facing change?

Add `no_sync_log` option to disable fsync on container log rotation and container exit. This can improve performance at the cost of potential data loss on machine crashes.

@rtreffer rtreffer requested a review from mrunalp as a code owner July 9, 2024 20:49
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 9, 2024
@openshift-ci openshift-ci bot requested review from klihub and littlejawa July 9, 2024 20:49
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 9, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 9, 2024

Hi @rtreffer. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.


// NoSyncLog if enabled will disable fsync on log rotation and container exit.
// This can improve performance but may result in date loss on hard system crashes.
NoSyncLog bool `toml:"no_sync_log"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cri-o/cri-o-maintainers I wonder if this should be a member of the runtime config structure, as it doesn't work for runtime pod or vm right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked other log related settings in this struct and they seem to have mixed scopes as well

  • LogSizeMax seems to apply to oci and pod
  • LogToJournald seems to apply to oci only

Would it be acceptable to document this correctly?
E.g. appending Applies to oci containers only to all doc strings for the setting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have, indeed, overloaded RuntimeConfig a little bit. I still am of a belief that LogLevel should not be there, for example. I suppose it was also done with some of the options for convenience since it's easier to access some of these depending on which objects are being passed around...

@haircommander, I think it would be fine... for now. We then can figure out how to clean this up.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry by runtime config I mean crio.runtime.runtimes runtime (talk about overloaded!) . this only works for runtime type OCI, and I'd love to see it as a member there instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so

  • Move it to RuntimeHandler
  • Add validation that it only applies to OCI

The thing I haven't figured out is what to do about the cli option in that case.
But I'll update this PR once I get that sorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personaly don't think the cli option is needed. Do you really want it @rtreffer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was convinient, but looks like it might be annoying to expose it as a flag, so I am in favor of dropping it for simplicity.

@kwilczynski
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 10, 2024
@codecov
Copy link

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 49.43%. Comparing base (0450024) to head (9ee9157).
Report is 138 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8363      +/-   ##
==========================================
- Coverage   49.49%   49.43%   -0.07%     
==========================================
  Files         153      153              
  Lines       17084    17166      +82     
==========================================
+ Hits         8456     8486      +30     
- Misses       7564     7610      +46     
- Partials     1064     1070       +6     

@rtreffer-rddt
Copy link

/retest-required

1 similar comment
@rtreffer
Copy link
Contributor Author

/retest-required

@rtreffer
Copy link
Contributor Author

/test ci-fedora-critest

@rtreffer
Copy link
Contributor Author

The remaining test failures (rawhide rpm build) are in line with other PRs / builds and as far as I can tell unrelated.

@kwilczynski
Copy link
Contributor

The remaining test failures (rawhide rpm build) are in line with other PRs / builds and as far as I can tell unrelated.

@rtreffer, that is fine to ignore for now. We know why this is failing. See:

So nothing to worry about for now 😄

@rtreffer
Copy link
Contributor Author

/retest-required

1 similar comment
@rtreffer
Copy link
Contributor Author

/retest-required

@kwilczynski
Copy link
Contributor

/retest

@rtreffer
Copy link
Contributor Author

/retest

@openshift-ci openshift-ci bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jul 28, 2024
Signed-off-by: René Treffer <[email protected]>
@rtreffer rtreffer force-pushed the rtreffer/no-sync-log branch from 5dcace4 to a048e15 Compare July 28, 2024 15:40
@openshift-ci openshift-ci bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Jul 28, 2024
@kwilczynski
Copy link
Contributor

/retest

@rtreffer
Copy link
Contributor Author

rtreffer commented Aug 7, 2024

/retest

3 similar comments
@rtreffer
Copy link
Contributor Author

rtreffer commented Aug 7, 2024

/retest

@rtreffer
Copy link
Contributor Author

rtreffer commented Aug 9, 2024

/retest

@kwilczynski
Copy link
Contributor

/retest

@rtreffer
Copy link
Contributor Author

rtreffer commented Aug 9, 2024

/restest

@rtreffer
Copy link
Contributor Author

/retest

1 similar comment
@rtreffer
Copy link
Contributor Author

/retest

@kwilczynski
Copy link
Contributor

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2024
@haircommander
Copy link
Member

/approve

thanks @rtreffer

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2024
@kwilczynski
Copy link
Contributor

/retest

@saschagrunert
Copy link
Member

/override ci/prow/ci-fedora-integration

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2024

@saschagrunert: Overrode contexts on behalf of saschagrunert: ci/prow/ci-fedora-integration

In response to this:

/override ci/prow/ci-fedora-integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kwilczynski, rtreffer, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fsync() contention when container logs get rotated can block containers writing to stdout/stderr

6 participants