-
Notifications
You must be signed in to change notification settings - Fork 385
Closed
Labels
kind/bugSomething isn't workingSomething isn't working
Description
What happened:
After installing HAMI 2.7.0 using Helm, the hami-scheduler pod encounters RBAC permission errors preventing normal leader election and scheduling operations. Specific errors include:
User "system:serviceaccount:kube-system:hami-scheduler" cannot get resource "endpoints" in API group "" in the namespace "kube-system"
User "system:serviceaccount:kube-system:hami-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
Post "191": dial tcp 127.0.0.1:443: connect: connection refused
What you expected to happen:
HAMI scheduler should start normally, successfully perform leader election, and schedule GPU workloads to cluster nodes without permission errors.
**How to reproduce it **:
-
Install HAMI 2.7.0 using Helm:
helm install hami . --set resourceName=[nvidia.com/vgpu](http://nvidia.com/vgpu) --set scheduler.kubeScheduler.imageTag=v1.19.6 -n kube-system
-
Check hami-scheduler pod logs:
kubectl logs -f -l app=hami-scheduler -n kube-system
-
Observe the RBAC permission errors preventing normal operation
-
Insufficient RBAC permissions prevent access to required Kubernetes resources
Environment:
- HAMi version: 2.7.0
- nvidia driver or other AI device driver version: [Please provide
nvidia-smi -a
output] - Docker version from
docker version
: [Please provide] - Docker command, image and tag used: Installed via Helm chart with default images
- Kernel version from
uname -a
: [Please provide] - Kubernetes version: v1.19.6
Metadata
Metadata
Assignees
Labels
kind/bugSomething isn't workingSomething isn't working