-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Open
Labels
area/pluginskind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/0-triageversion/26.1
Description
Description
We've been running a stable 5-node Docker Swarm cluster for over a year without any issues. The cluster is managed using Portainer Community (v2.26.0). After upgrading Filebeat from version 8.13 to 8.17, the cluster started behaving unexpectedly:
- Portainer frequently loses connection to the cluster.
- Attempting to reconnect through Portainer causes the Docker daemon on the manager node to crash, showing the following exception (see logs below).
- Nodes intermittently lose connection with each other but seem to recover automatically after some time.
The cluster setup includes local volumes, shared NFS volumes, and tmpfs drivers.
Update:
docker volume ls
makes the Docker daemon to fail with the following exception.
There are 4 NFS volumes in Docker /var/lib/docker/volumes
ng:false netPeers:5 entries:49 Queue qLen:0 netMsg/s:0"
Jan 20 12:01:52 swarm01.dev dockerd[1442]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 20 12:01:52 swarm01.dev dockerd[1442]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55a526d7bd4a]
Jan 20 12:01:52 swarm01.dev dockerd[1442]: goroutine 33934 [running]:
Jan 20 12:01:52 swarm01.dev dockerd[1442]: github.com/docker/docker/pkg/plugins.(*Client).callWithRetry(0x0, {0x55a5280dd2ff, 0x11}, {0x55a528ea5ae0, 0xc003af8750}, 0x1, {0xc00448a318, 0x1, 0x0?})
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/pkg/plugins/client.go:181 +0x14a
Jan 20 12:01:52 swarm01.dev dockerd[1442]: github.com/docker/docker/pkg/plugins.(*Client).CallWithOptions(0x55a5284c5d60?, {0x55a5280dd2ff, 0x11}, {0x55a528a23780, 0x55a52a4ebf00}, {0x55a52884d740, 0xc003af8720}, {0xc00448a318, 0x1, 0x1})
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/pkg/plugins/client.go:134 +0x17a
Jan 20 12:01:52 swarm01.dev dockerd[1442]: github.com/docker/docker/volume/drivers.(*volumeDriverProxy).List(0xc002f1f5c0)
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/volume/drivers/proxy.go:186 +0xe8
Jan 20 12:01:52 swarm01.dev dockerd[1442]: github.com/docker/docker/volume/drivers.(*volumeDriverAdapter).List(0xc003af86f0)
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/volume/drivers/adapter.go:43 +0x33
Jan 20 12:01:52 swarm01.dev dockerd[1442]: github.com/docker/docker/volume/service.(*VolumeStore).list.func1({0x55a528edcbc8, 0xc003af86f0})
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/volume/service/store.go:439 +0x4b
Jan 20 12:01:52 swarm01.dev dockerd[1442]: created by github.com/docker/docker/volume/service.(*VolumeStore).list in goroutine 33931
Jan 20 12:01:52 swarm01.dev dockerd[1442]: /root/rpmbuild/BUILD/src/engine/volume/service/store.go:438 +0x2f6
Jan 20 12:01:52 swarm01.dev systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 20 12:01:52 swarm01.dev systemd[1]: docker.service: Failed with result 'exit-code'.
Reproduce
- Create a fresh Docker Swarm cluster
- Run services with local, tmpfs, nfs volume drivers
- Try to connect Docker Swarm cluster to Portainer using Portainer Agent
Expected behavior
Portainer should successfully connect to Docker Swarm cluster without making manager node daemon to crash
docker version
Client: Docker Engine - Community
Version: 26.1.3
API version: 1.45
Go version: go1.21.10
Git commit: b72abbb
Built: Thu May 16 08:34:39 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.1.3
API version: 1.45 (minimum version 1.24)
Go version: go1.21.10
Git commit: 8e96db1
Built: Thu May 16 08:33:34 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.32
GitCommit: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 26.1.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.14.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.27.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 10
Running: 5
Paused: 0
Stopped: 5
Images: 10
Server Version: 26.1.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: csweqemxq5gzltgd461qn00fp
Is Manager: true
ClusterID: 23uuam1rourlfm8ni4sgk5exi
Managers: 1
Nodes: 5
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 9789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.11.94
Manager Addresses:
172.31.11.94:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.18.0-553.22.1.el8_10.x86_64
Operating System: AlmaLinux 8.10 (Cerulean Leopard)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.736GiB
Name: swarm01.dev.bankinglab.io
ID: 9e58c355-712a-42ab-a380-9250f83cbd23
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Default Address Pools:
Base: 10.20.0.0/16, Size: 24
Additional Info
No response
Metadata
Metadata
Assignees
Labels
area/pluginskind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/0-triageversion/26.1