Skip to content

HAMI 2.7.0 Scheduler RBAC Permission Issues After Installation #1378

@xinyang09

Description

@xinyang09

What happened:

After installing HAMI 2.7.0 using Helm, the hami-scheduler pod encounters RBAC permission errors preventing normal leader election and scheduling operations. Specific errors include:

  1. User "system:serviceaccount:kube-system:hami-scheduler" cannot get resource "endpoints" in API group "" in the namespace "kube-system"
  2. User "system:serviceaccount:kube-system:hami-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
  3. Post "191": dial tcp 127.0.0.1:443: connect: connection refused

What you expected to happen:

HAMI scheduler should start normally, successfully perform leader election, and schedule GPU workloads to cluster nodes without permission errors.

**How to reproduce it **:

  1. Install HAMI 2.7.0 using Helm:

    helm install hami . --set resourceName=[nvidia.com/vgpu](http://nvidia.com/vgpu) --set scheduler.kubeScheduler.imageTag=v1.19.6 -n kube-system
  2. Check hami-scheduler pod logs:

    kubectl logs -f -l app=hami-scheduler -n kube-system
  3. Observe the RBAC permission errors preventing normal operation

  4. Insufficient RBAC permissions prevent access to required Kubernetes resources

Environment:

  • HAMi version: 2.7.0
  • nvidia driver or other AI device driver version: [Please provide nvidia-smi -a output]
  • Docker version from docker version: [Please provide]
  • Docker command, image and tag used: Installed via Helm chart with default images
  • Kernel version from uname -a: [Please provide]
  • Kubernetes version: v1.19.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions