Update FlashInfer version used by vLLM tests on PyTorch CI to v0.3.1

This is to match what vLLM is using after https://github.com/vllm-project/vllm/pull/25782, we should also explore the option to use the new `tools/flashinfer-build.sh` script there to simplify the build process.  This is an important dependency that we need to update to match the behavior of vLLM CI.

https://github.com/pytorch/pytorch/pull/164361 attempted to do this while trying to fix trunk, but we encountered several issues along the way that prompted us to abandon the approach:
* Missing dependencies like `cuda-python` and `pynvml`
* Compiling the newer version of FlashInfer now requires linking with CUDA driver `-lcuda`, which makes it impossible to build on CPU-only runner.  We need to understand why.  On vLLM side, they avoid this problem by installing a precompiled wheel of FlashInfer.  This might work, but I know for sure that I need to rebuild that wheel when upgrading PyTorch to 2.9 https://github.com/vllm-project/vllm/pull/24994/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR415

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @pytorch/pytorch-dev-infra @yangw-dev 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update FlashInfer version used by vLLM tests on PyTorch CI to v0.3.1 #164562

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update FlashInfer version used by vLLM tests on PyTorch CI to v0.3.1 #164562

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions