-
Notifications
You must be signed in to change notification settings - Fork 530
Closed
gardener/gardener-extension-os-suse-chost
#70Labels
area/ops-productivityOperator productivity related (how to improve operations)Operator productivity related (how to improve operations)kind/bugBugBugpriority/4Priority (lower number equals higher priority)Priority (lower number equals higher priority)
Description
How to categorize this issue?
/area ops-productivity
/kind bug
What happened:
The node-problem-detector has a hardcoded path to look for logs:
logdir: /var/log/ |
journalctl
on suse-chost is configured to use /run/log/journal
instead. Therefore, the node-problem-detector errors on suse-chost with error watching journald: failed to stat the log path "/var/log/journal"
– the node is still considered to be healthy, though, it is "just" the log-counter binary that is immediately exiting: https://github.com/kubernetes/node-problem-detector/blob/1e8008bdedbeae39074c93cfe3fcdad7735f4db1/cmd/logcounter/log_counter.go#L44-L48
What you expected to happen:
Probably something similar to #3917 to feed the correct value to the node-problem-detector?
How to reproduce it (as minimally and precisely as possible):
- Create a cluster with suse-chost
- See that
journalctl
is configured to use/run/log/journal
instead of/var/log/journal
$ journalctl --header
File Path: /run/log/journal/2b480512109fc79f07a183da96604459/system.journal
- See that node-problem-detector logs the error above, as it tries to find logs in the wrong directory
kubectl -n kube-system logs node-problem-detector-sspw8 | grep '/var/log/journal'
I1103 09:30:18.299709 1 log_monitor.go:79] Finish parsing log monitor config file /config/docker-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:dockerd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:docker-monitor DefaultConditions:[{Type:CorruptDockerOverlay2 Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoCorruptDockerOverlay2 Message:docker overlay2 is functioning properly}] Rules:[{Type:temporary Condition: Reason:CorruptDockerImage Pattern:Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {Type:permanent Condition:CorruptDockerOverlay2 Reason:CorruptDockerOverlay2 Pattern:returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {Type:temporary Condition: Reason:DockerContainerStartupFailure Pattern:OCI runtime start failed: container process is already dead: unknown}] EnableMetricsReporting:0xc000045a7a}
I1103 09:30:18.396959 1 log_monitor.go:79] Finish parsing log monitor config file /config/systemd-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:systemd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:systemd-monitor DefaultConditions:[] Rules:[{Type:temporary Condition: Reason:KubeletStart Pattern:Started Kubernetes kubelet.} {Type:temporary Condition: Reason:DockerStart Pattern:Starting Docker Application Container Engine...} {Type:temporary Condition: Reason:ContainerdStart Pattern:Starting containerd container runtime...}] EnableMetricsReporting:0xc000045e42}
E1103 09:30:18.597642 1 problem_detector.go:55] Failed to start problem daemon &{/config/docker-monitor.json 0xc0004c8200 0xc000551300 {{journald map[source:dockerd] /var/log/journal 5m } 10 docker-monitor [{CorruptDockerOverlay2 {0 0 <nil>} NoCorruptDockerOverlay2 docker overlay2 is functioning properly}] [{temporary CorruptDockerImage Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {permanent CorruptDockerOverlay2 CorruptDockerOverlay2 returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {temporary DockerContainerStartupFailure OCI runtime start failed: container process is already dead: unknown}] 0xc000045a7a} [] <nil> 0xc00048cb40 0xc000549910}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
E1103 09:30:18.597709 1 problem_detector.go:55] Failed to start problem daemon &{/config/systemd-monitor.json 0xc0004c8280 0xc000551640 {{journald map[source:systemd] /var/log/journal 5m } 10 systemd-monitor [] [{temporary KubeletStart Started Kubernetes kubelet.} {temporary DockerStart Starting Docker Application Container Engine...} {temporary ContainerdStart Starting containerd container runtime...}] 0xc000045e42} [] <nil> 0xc00048cde0 0xc000549b50}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
I1103 09:30:20.296566 1 custom_plugin_monitor.go:276] New status generated: &{Source:kernel-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:Node condition FrequentUnregisterNetDevice is now: Unknown, reason: NoFrequentUnregisterNetDevice}] Conditions:[{Type:FrequentUnregisterNetDevice Status:Unknown Transition:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}
I1103 09:30:20.296957 1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:Node condition FrequentKubeletRestart is now: Unknown, reason: NoFrequentKubeletRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:False Transition:2021-11-03 09:30:18.69729866 +0000 UTC m=+0.900222393 Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:20.798576 1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:Node condition FrequentDockerRestart is now: Unknown, reason: NoFrequentDockerRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:21.597931 1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:Node condition FrequentContainerdRestart is now: Unknown, reason: NoFrequentContainerdRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:Unknown Transition:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}
Anything else we need to know?:
Thanks for @vpnachev for analyzing this, I'm just writing this down so we actually get to work on it!
There's also kubernetes/node-problem-detector#621, so maybe we don't need to do anything?
Metadata
Metadata
Assignees
Labels
area/ops-productivityOperator productivity related (how to improve operations)Operator productivity related (how to improve operations)kind/bugBugBugpriority/4Priority (lower number equals higher priority)Priority (lower number equals higher priority)