Skip to content

node-problem-detector should detect the correct logging on suse-chost #4957

@voelzmo

Description

@voelzmo

How to categorize this issue?

/area ops-productivity
/kind bug

What happened:
The node-problem-detector has a hardcoded path to look for logs:

but journalctl on suse-chost is configured to use /run/log/journal instead. Therefore, the node-problem-detector errors on suse-chost with error watching journald: failed to stat the log path "/var/log/journal" – the node is still considered to be healthy, though, it is "just" the log-counter binary that is immediately exiting: https://github.com/kubernetes/node-problem-detector/blob/1e8008bdedbeae39074c93cfe3fcdad7735f4db1/cmd/logcounter/log_counter.go#L44-L48

What you expected to happen:
Probably something similar to #3917 to feed the correct value to the node-problem-detector?

How to reproduce it (as minimally and precisely as possible):

  • Create a cluster with suse-chost
  • See that journalctl is configured to use /run/log/journal instead of /var/log/journal
$ journalctl --header
File Path: /run/log/journal/2b480512109fc79f07a183da96604459/system.journal
  • See that node-problem-detector logs the error above, as it tries to find logs in the wrong directory
kubectl -n kube-system logs node-problem-detector-sspw8 | grep '/var/log/journal'
I1103 09:30:18.299709       1 log_monitor.go:79] Finish parsing log monitor config file /config/docker-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:dockerd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:docker-monitor DefaultConditions:[{Type:CorruptDockerOverlay2 Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoCorruptDockerOverlay2 Message:docker overlay2 is functioning properly}] Rules:[{Type:temporary Condition: Reason:CorruptDockerImage Pattern:Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {Type:permanent Condition:CorruptDockerOverlay2 Reason:CorruptDockerOverlay2 Pattern:returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {Type:temporary Condition: Reason:DockerContainerStartupFailure Pattern:OCI runtime start failed: container process is already dead: unknown}] EnableMetricsReporting:0xc000045a7a}
I1103 09:30:18.396959       1 log_monitor.go:79] Finish parsing log monitor config file /config/systemd-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:systemd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:systemd-monitor DefaultConditions:[] Rules:[{Type:temporary Condition: Reason:KubeletStart Pattern:Started Kubernetes kubelet.} {Type:temporary Condition: Reason:DockerStart Pattern:Starting Docker Application Container Engine...} {Type:temporary Condition: Reason:ContainerdStart Pattern:Starting containerd container runtime...}] EnableMetricsReporting:0xc000045e42}
E1103 09:30:18.597642       1 problem_detector.go:55] Failed to start problem daemon &{/config/docker-monitor.json 0xc0004c8200 0xc000551300 {{journald map[source:dockerd] /var/log/journal 5m } 10 docker-monitor [{CorruptDockerOverlay2  {0 0 <nil>} NoCorruptDockerOverlay2 docker overlay2 is functioning properly}] [{temporary  CorruptDockerImage Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {permanent CorruptDockerOverlay2 CorruptDockerOverlay2 returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {temporary  DockerContainerStartupFailure OCI runtime start failed: container process is already dead: unknown}] 0xc000045a7a} [] <nil> 0xc00048cb40 0xc000549910}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
E1103 09:30:18.597709       1 problem_detector.go:55] Failed to start problem daemon &{/config/systemd-monitor.json 0xc0004c8280 0xc000551640 {{journald map[source:systemd] /var/log/journal 5m } 10 systemd-monitor [] [{temporary  KubeletStart Started Kubernetes kubelet.} {temporary  DockerStart Starting Docker Application Container Engine...} {temporary  ContainerdStart Starting containerd container runtime...}] 0xc000045e42} [] <nil> 0xc00048cde0 0xc000549b50}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
I1103 09:30:20.296566       1 custom_plugin_monitor.go:276] New status generated: &{Source:kernel-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:Node condition FrequentUnregisterNetDevice is now: Unknown, reason: NoFrequentUnregisterNetDevice}] Conditions:[{Type:FrequentUnregisterNetDevice Status:Unknown Transition:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}
I1103 09:30:20.296957       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:Node condition FrequentKubeletRestart is now: Unknown, reason: NoFrequentKubeletRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:False Transition:2021-11-03 09:30:18.69729866 +0000 UTC m=+0.900222393 Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:20.798576       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:Node condition FrequentDockerRestart is now: Unknown, reason: NoFrequentDockerRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:21.597931       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:Node condition FrequentContainerdRestart is now: Unknown, reason: NoFrequentContainerdRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:Unknown Transition:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}

Anything else we need to know?:
Thanks for @vpnachev for analyzing this, I'm just writing this down so we actually get to work on it!
There's also kubernetes/node-problem-detector#621, so maybe we don't need to do anything?

Metadata

Metadata

Assignees

Labels

area/ops-productivityOperator productivity related (how to improve operations)kind/bugBugpriority/4Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions