-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Open
Labels
high prioritymodule: ciRelated to continuous integrationRelated to continuous integrationmodule: vllmtriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
This is to match what vLLM is using after vllm-project/vllm#25782, we should also explore the option to use the new tools/flashinfer-build.sh
script there to simplify the build process. This is an important dependency that we need to update to match the behavior of vLLM CI.
#164361 attempted to do this while trying to fix trunk, but we encountered several issues along the way that prompted us to abandon the approach:
- Missing dependencies like
cuda-python
andpynvml
- Compiling the newer version of FlashInfer now requires linking with CUDA driver
-lcuda
, which makes it impossible to build on CPU-only runner. We need to understand why. On vLLM side, they avoid this problem by installing a precompiled wheel of FlashInfer. This might work, but I know for sure that I need to rebuild that wheel when upgrading PyTorch to 2.9 https://github.com/vllm-project/vllm/pull/24994/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR415
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @pytorch/pytorch-dev-infra @yangw-dev
Metadata
Metadata
Labels
high prioritymodule: ciRelated to continuous integrationRelated to continuous integrationmodule: vllmtriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Prioritized