-
Notifications
You must be signed in to change notification settings - Fork 385
Description
What happened: There was one workload that used all the node resources, and another workload was in kueue's queue due to insufficient resources. Now, terminate the first workload, the second workload will be admitted by kueue, then it was handed over to hami sheduler to begin scheduling. However, this workload is pending for around 5mins until it gets scheduled by hami sheculder. It turned out when the first workload was terminating, the second workload failed filtering since the first workload is still terminating. But after the first worload was terminated, the second worload was still not retrying filtering for 5mins, leading to unnecessary wait. Below are proof that the filter endpoint was only invoked twice during all those 5mins (14m-9m)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14m hami-scheduler 0/1 nodes are available: 1 NodeUnfitPod. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FailedScheduling 14m hami-scheduler 0/1 nodes are available: 1 NodeUnfitPod. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Normal Scheduled 9m32s hami-scheduler Successfully assigned test-project/test-ws-1-0 to ip-172-31-35-36
Warning FilteringFailed 14m (x2 over 14m) hami-scheduler no available node, 1 nodes do not meet
Normal FilteringSucceed 9m33s hami-scheduler find fit node(ip-172-31-35-36), 0 nodes not fit, 1 nodes fit(ip-172-31-35-36:0.00)
What you expected to happen: The second workload should be scheduled immediately after the first workload is terminated.
How to reproduce it (as minimally and precisely as possible):
- Install both kueue and hami in an nvidia cluster (a single node server is sufficient)
- Create a kueue clusterqueue and localqueue
- Submit a gpu workload that uses as much resource as possible to the kueue queue.
- Submit another gpu worload to kueue queue (this workload will be pending in the kueue queue)
- Terminate the first gpu workload
- The second workload will be admitted by kueue and handed over to hami
- The second workload will sometimes be pending for around 5mins before it gets admitted
- If the above steps does not reproduce the issue, repeat step 1-7.
Environment:
- HAMi version: v2.6.1
- nvidia driver version: 550.144.03
- Kernel version from
uname -a
: 6.8.0-1021-aws - k8s version:
Client Version: v1.30.9
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.8+k3s1
- Server: AWS g4dn.2xlarge
- OS: Ubuntu 22.04.5 LTS