Skip to content

fsync() contention when container logs get rotated can block containers writing to stdout/stderr #6743

@rmt

Description

@rmt

What happened?

When kubelet rotates logs and asks the remote runtime to reopen logs, a blocking fsync() occurs in conmon. The pipe buffers can fill in the meantime while waiting for fsync() to complete, blocking writes to stdout/stderr.

conmon has an option called --no-sync-log which can alleviate this problem (and should arguably be the default)
However, podman does not support setting this.
And neither does cri-o, which uses podman.

The required changes should probably be driven by CRI-O.

This will usually not be noticed on a local disk or with low-volume logs, and very intermittently if the max-log-file-size is sufficiently large. However, we have logs on an AWS EBS (GP3) volume and there is latency involved, and the max log file size was quite low given the amount of logs produced, which made the issue very obvious and got us to dig into it.

I suspect that this is that this is behind a significant number of (usually minor) latency spikes that people see.

What did you expect to happen?

Writing to stdout from a container should not block under normal circumstances

How can we reproduce it (as minimally and precisely as possible)?

Do lots writes and fsync()s on the container log volume while producing lots logs in a tight loop from one or more k8s containers, and set the max-log-file-size lower. During log rotation / when the pipe is blocked, fewer log lines will be produced. In a real world scenario, this might be HTTP requests that are delayed.

Anything else we need to know?

Calling fsync on the critical path when writing log files is probably a bad thing

CRI-O and Kubernetes version

$ crio --version
crio version 1.23.5
Version:          1.23.5
GitCommit:        d12c2842e9e4d051a2496e1740f846ffa22b3f1d
GitTreeState:     clean
BuildDate:        2023-01-10T21:30:25Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        exclude_graphdriver_devicemapper, seccomp
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.14", GitCommit:"3321ffc07d2f046afdf613796f9032f4460de093", GitTreeState:"clean", BuildDate:"2022-11-09T13:40:19Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.14", GitCommit:"3321ffc07d2f046afdf613796f9032f4460de093", GitTreeState:"clean", BuildDate:"2022-11-09T13:32:47Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/arm64"}

OS version

$ cat /etc/os-release 
NAME="Rocky Linux"
VERSION="8.7 (Green Obsidian)"

$ uname -a
Linux ip-10-118-82-234.ec2.internal 4.18.0-425.10.1.el8_7.x86_64 #1 SMP Thu Jan 12 16:32:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

AWS GP3 with shared volume group for /var/log/pods and /var/log/containers, which doesn't do anything good for I/O contention or consistency, of course.

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions