node-problem-detector should detect the correct logging on suse-chost

**How to categorize this issue?**

/area ops-productivity
/kind bug

**What happened**:
The node-problem-detector has a hardcoded path to look for logs: https://github.com/gardener/gardener/blob/a71be722d3346a87ea58e15d3f2ef1b17a7e0b91/charts/shoot-core/components/charts/node-problem-detector/values.yaml#L18 but `journalctl` on suse-chost is configured to use `/run/log/journal` instead. Therefore, the node-problem-detector errors on suse-chost with `error watching journald: failed to stat the log path "/var/log/journal"` – the node is still considered to be healthy, though, it is "just" the log-counter binary that is immediately exiting: https://github.com/kubernetes/node-problem-detector/blob/1e8008bdedbeae39074c93cfe3fcdad7735f4db1/cmd/logcounter/log_counter.go#L44-L48

**What you expected to happen**:
Probably something similar to https://github.com/gardener/gardener/pull/3917 to feed the correct value to the node-problem-detector?

**How to reproduce it (as minimally and precisely as possible)**:
* Create a cluster with suse-chost
* See that `journalctl` is configured to use `/run/log/journal` instead of `/var/log/journal`
```
$ journalctl --header
File Path: /run/log/journal/2b480512109fc79f07a183da96604459/system.journal
```
* See that node-problem-detector logs the error above, as it tries to find logs in the wrong directory
```
kubectl -n kube-system logs node-problem-detector-sspw8 | grep '/var/log/journal'
I1103 09:30:18.299709       1 log_monitor.go:79] Finish parsing log monitor config file /config/docker-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:dockerd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:docker-monitor DefaultConditions:[{Type:CorruptDockerOverlay2 Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoCorruptDockerOverlay2 Message:docker overlay2 is functioning properly}] Rules:[{Type:temporary Condition: Reason:CorruptDockerImage Pattern:Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {Type:permanent Condition:CorruptDockerOverlay2 Reason:CorruptDockerOverlay2 Pattern:returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {Type:temporary Condition: Reason:DockerContainerStartupFailure Pattern:OCI runtime start failed: container process is already dead: unknown}] EnableMetricsReporting:0xc000045a7a}
I1103 09:30:18.396959       1 log_monitor.go:79] Finish parsing log monitor config file /config/systemd-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:systemd] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:systemd-monitor DefaultConditions:[] Rules:[{Type:temporary Condition: Reason:KubeletStart Pattern:Started Kubernetes kubelet.} {Type:temporary Condition: Reason:DockerStart Pattern:Starting Docker Application Container Engine...} {Type:temporary Condition: Reason:ContainerdStart Pattern:Starting containerd container runtime...}] EnableMetricsReporting:0xc000045e42}
E1103 09:30:18.597642       1 problem_detector.go:55] Failed to start problem daemon &{/config/docker-monitor.json 0xc0004c8200 0xc000551300 {{journald map[source:dockerd] /var/log/journal 5m } 10 docker-monitor [{CorruptDockerOverlay2  {0 0 <nil>} NoCorruptDockerOverlay2 docker overlay2 is functioning properly}] [{temporary  CorruptDockerImage Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*} {permanent CorruptDockerOverlay2 CorruptDockerOverlay2 returned error: readlink /var/lib/docker/overlay2.*: invalid argument.*} {temporary  DockerContainerStartupFailure OCI runtime start failed: container process is already dead: unknown}] 0xc000045a7a} [] <nil> 0xc00048cb40 0xc000549910}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
E1103 09:30:18.597709       1 problem_detector.go:55] Failed to start problem daemon &{/config/systemd-monitor.json 0xc0004c8280 0xc000551640 {{journald map[source:systemd] /var/log/journal 5m } 10 systemd-monitor [] [{temporary  KubeletStart Started Kubernetes kubelet.} {temporary  DockerStart Starting Docker Application Container Engine...} {temporary  ContainerdStart Starting containerd container runtime...}] 0xc000045e42} [] <nil> 0xc00048cde0 0xc000549b50}: failed to stat the log path "/var/log/journal": stat /var/log/journal: no such file or directory
I1103 09:30:20.296566       1 custom_plugin_monitor.go:276] New status generated: &{Source:kernel-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:Node condition FrequentUnregisterNetDevice is now: Unknown, reason: NoFrequentUnregisterNetDevice}] Conditions:[{Type:FrequentUnregisterNetDevice Status:Unknown Transition:2021-11-03 09:30:20.296511789 +0000 UTC m=+2.499435529 Reason:NoFrequentUnregisterNetDevice Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}
I1103 09:30:20.296957       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:Node condition FrequentKubeletRestart is now: Unknown, reason: NoFrequentKubeletRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:False Transition:2021-11-03 09:30:18.69729866 +0000 UTC m=+0.900222393 Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:20.798576       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:Node condition FrequentDockerRestart is now: Unknown, reason: NoFrequentDockerRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:False Transition:2021-11-03 09:30:18.697298749 +0000 UTC m=+0.900222472 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]}
I1103 09:30:21.597931       1 custom_plugin_monitor.go:276] New status generated: &{Source:systemd-monitor Events:[{Severity:info Timestamp:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:Node condition FrequentContainerdRestart is now: Unknown, reason: NoFrequentContainerdRestart}] Conditions:[{Type:FrequentKubeletRestart Status:Unknown Transition:2021-11-03 09:30:20.296886051 +0000 UTC m=+2.499809833 Reason:NoFrequentKubeletRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentDockerRestart Status:Unknown Transition:2021-11-03 09:30:20.798475646 +0000 UTC m=+3.001399403 Reason:NoFrequentDockerRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v} {Type:FrequentContainerdRestart Status:Unknown Transition:2021-11-03 09:30:21.59787525 +0000 UTC m=+3.800798988 Reason:NoFrequentContainerdRestart Message:error watching journald: failed to stat the log path "/var/log/journal": stat /v}]}
```

**Anything else we need to know?**:
Thanks for @vpnachev for analyzing this, I'm just writing this down so we actually get to work on it!
There's also https://github.com/kubernetes/node-problem-detector/issues/621, so maybe we don't need to do anything?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

node-problem-detector should detect the correct logging on suse-chost #4957

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

node-problem-detector should detect the correct logging on suse-chost #4957

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions