Skip to content

HAMi Scheduler Not Trying to Schedule Previously Pending Workload for 5mins #1368

@CcccYxx

Description

@CcccYxx

What happened: There was one workload that used all the node resources, and another workload was in kueue's queue due to insufficient resources. Now, terminate the first workload, the second workload will be admitted by kueue, then it was handed over to hami sheduler to begin scheduling. However, this workload is pending for around 5mins until it gets scheduled by hami sheculder. It turned out when the first workload was terminating, the second workload failed filtering since the first workload is still terminating. But after the first worload was terminated, the second worload was still not retrying filtering for 5mins, leading to unnecessary wait. Below are proof that the filter endpoint was only invoked twice during all those 5mins (14m-9m)

Events:
  Type     Reason            Age                    From            Message
  ----     ------            ----                   ----            -------
  Warning  FailedScheduling  14m                    hami-scheduler  0/1 nodes are available: 1 NodeUnfitPod. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning  FailedScheduling  14m                    hami-scheduler  0/1 nodes are available: 1 NodeUnfitPod. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         9m32s                  hami-scheduler  Successfully assigned test-project/test-ws-1-0 to ip-172-31-35-36
  Warning  FilteringFailed   14m (x2 over 14m)      hami-scheduler  no available node, 1 nodes do not meet
  Normal   FilteringSucceed  9m33s                  hami-scheduler  find fit node(ip-172-31-35-36), 0 nodes not fit, 1 nodes fit(ip-172-31-35-36:0.00)

What you expected to happen: The second workload should be scheduled immediately after the first workload is terminated.

How to reproduce it (as minimally and precisely as possible):

  1. Install both kueue and hami in an nvidia cluster (a single node server is sufficient)
  2. Create a kueue clusterqueue and localqueue
  3. Submit a gpu workload that uses as much resource as possible to the kueue queue.
  4. Submit another gpu worload to kueue queue (this workload will be pending in the kueue queue)
  5. Terminate the first gpu workload
  6. The second workload will be admitted by kueue and handed over to hami
  7. The second workload will sometimes be pending for around 5mins before it gets admitted
  8. If the above steps does not reproduce the issue, repeat step 1-7.

Environment:

  • HAMi version: v2.6.1
  • nvidia driver version: 550.144.03
  • Kernel version from uname -a: 6.8.0-1021-aws
  • k8s version:
Client Version: v1.30.9
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.8+k3s1
  • Server: AWS g4dn.2xlarge
  • OS: Ubuntu 22.04.5 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions