Skip to content

Calico DNS Resolution Failure - Bond Interface with VLAN and Incorrect Subnet Mask #7664

@maelamrani

Description

@maelamrani

Hello
I'm experiencing DNS resolution failures between pods on a specific Kubernetes node that has a complex network bonding configuration. Other nodes in the cluster work perfectly.
Environment:

  • Kubernetes 1.32.5
  • Calico CNI with VXLAN backend
  • RHEL 9.4 nodes
  • Kubespray deployment
    Problem Node Configuration:
  • Bond interface bond0 with two slaves (eno1, eno2) in active-backup mode
  • VLAN interface bond0.114 with IP 10.172.114.1/26
  • VLAN interface bond0.1083 with IP 10.172.108.129/26
  • Kubernetes node internal IP: 10.172.108.129
    Symptoms:
  • DNS resolution times out in pods scheduled on this node
  • nslookup kubernetes.default.svc.cluster.local fails with "connection timed out; no servers could be reached"
  • Pods can communicate via IP but not via DNS
  • CoreDNS pods are healthy and working on other nodes
    What I've Tried:
  1. Verified CoreDNS pods are running and healthy
  2. Checked Calico daemonset configuration
  3. Compared with working nodes (which have simple ens192 interface without bonding)
  4. Tested network connectivity - basic IP communication works
    Current Calico Configuration:
env:
- name: IP_AUTODETECTION_METHOD
  value: can-reach=$(NODEIP)

Key Observations:

  • Working nodes have simple network interface (ens192)
  • Problem node has bond interface with VLAN tagging
  • The bond0 interface itself has no IP address
  • IP addresses are assigned to VLAN subinterfaces (bond0.114, bond0.1083)
  • One VLAN interface has an incorrect /2 subnet mask
    Questions:
  1. How should I configure Calico's IP autodetection for this bond+VLAN setup?
  2. Should the node IP be on the bond interface itself or can it remain on the VLAN interface?
  3. What's the recommended approach for Calico with bonded interfaces and VLANs?
  4. Could the incorrect /2 subnet mask be causing the DNS issues even if basic IP communication works?
    Debug Information:
# On problem node
$ nmcli device status
bond0.1083       vlan      connected
bond0            bond      connected
eno1             ethernet  connected
eno2             ethernet  connected
# IP addresses
$ ip -4 addr show bond0
# (no IP address)
$ ip -4 addr show bond0.114
inet 10.172.114.1/26
$ ip -4 addr show bond0.1083
inet 10.172.108.129/26
# Kubernetes node
$ kubectl get node dc2spk8sprdma001 -o wide
INTERNAL-IP: 10.172.108.129

Any guidance on proper Calico configuration for this bonded network setup would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions