Skip to content

Conversation

archlitchi
Copy link
Member

@archlitchi archlitchi commented Jul 24, 2025

Fix

  1. device memory not counted properly when allocating with 'cuMallocAsync'
  2. device memory not counted properly when running gpu_burn
  3. segmentation fault on some scenarios
  4. utilization metrics not properly count when using multiple devices
  5. initialization error when using vllm with tp>2

Related issues: Project-HAMi/HAMi#1181
#96
Project-HAMi/HAMi#1055
Project-HAMi/HAMi#1219
Project-HAMi/HAMi#1230
Project-HAMi/HAMi#1191

Signed-off-by: limengxuan <[email protected]>
Signed-off-by: limengxuan <[email protected]>
Signed-off-by: limengxuan <[email protected]>
@hami-robot
Copy link
Contributor

hami-robot bot commented Jul 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: archlitchi

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot hami-robot bot added the size/M label Jul 24, 2025
Signed-off-by: limengxuan <[email protected]>
Signed-off-by: limengxuan <[email protected]>
@hami-robot hami-robot bot added size/L and removed size/M labels Jul 28, 2025
Signed-off-by: limengxuan <[email protected]>
Signed-off-by: limengxuan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant