pytorch / FBGEMM Public

Notifications You must be signed in to change notification settings
Fork 698
Star 1.5k

Code
Issues 60
Pull requests 603
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Pull requests: pytorch/FBGEMM

Labels 47 Milestones 0

New pull request New

603 Open 4,448 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Adding support for bias addition + rescaling with token weights to grouped_gemm cla signed fb-exported meta-exported

#5280 opened Dec 29, 2025 by metastableB

Loading…

Refactor cumem_utils CPU library to cut GPU dependencies cla signed fb-exported meta-exported

#5279 opened Dec 29, 2025 by crypt3lx2k

Loading…

Add repeat_arange cuda kernel cla signed fb-exported meta-exported

#5278 opened Dec 26, 2025 by yunjiangster

Loading…

Add tidy fixes (#5268) cla signed fb-exported meta-exported

#5277 opened Dec 24, 2025 by q10

Loading…

Replace .data_ptr with .mutable_data_ptr or .const_data_ptr (#5267) cla signed fb-exported meta-exported

#5276 opened Dec 24, 2025 by q10

Loading…

Tune max segment length per cta in triton table batched embeddings, and expose the param via cli cla signed fb-exported meta-exported

#5270 opened Dec 22, 2025 by OmarPavel

Loading…

Add tidy fixes cla signed

#5268 opened Dec 21, 2025 by cyyever

Loading…

Replace .data_ptr with .mutable_data_ptr or .const_data_ptr cla signed

#5267 opened Dec 20, 2025 by cyyever

Loading…

Optimizations for index_select_scalar_cumsum_kernel on ROCm cla signed module: rocm

#5263 opened Dec 18, 2025 by amd-wsung102

Loading…

Refactor TBE benchmark reporter to use structured data config cla signed fb-exported meta-exported

#5260 opened Dec 18, 2025 by gchalump

Loading…

Fix blackwell CUTLASS attention meta registration + actually test compile cla signed fb-exported meta-exported

#5259 opened Dec 18, 2025 by jbschlosser

Loading…

Optimize benchmark index generation with std::sample() cla signed fb-exported meta-exported

#5254 opened Dec 17, 2025 by terdogan

Loading…

Remove unused dedup_map and associated includes from benchmarks cla signed fb-exported meta-exported

#5253 opened Dec 17, 2025 by terdogan

Loading…

Move the prefetched info to preallocated buffers cla signed fb-exported meta-exported

#5251 opened Dec 17, 2025 by chouxi

Loading…

Enable direct MX4→BF16 dequantization to reduce memory (python side) (2/2) cla signed fb-exported meta-exported

#5250 opened Dec 17, 2025 by armandsauzay

Loading…

Add aarch64 intrinsic-based dequantization to autovec routine cla signed fb-exported meta-exported

#5249 opened Dec 17, 2025 by Nicoshev

Loading…

Choose _autovec version of GenerateEmbeddingSpMDMRowWiseSparse on AArch64 cla signed fb-exported meta-exported

#5247 opened Dec 17, 2025 by MatzeB

Loading…

Specialize more cases to improve EmbeddingSpMDMNBitBenchmark cla signed fb-exported meta-exported

#5245 opened Dec 17, 2025 by MatzeB

Loading…

Add EmbeddingSpMDMNBitRowWiseSparse autovectorized variant cla signed fb-exported meta-exported

#5244 opened Dec 17, 2025 by MatzeB

Loading…

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions cla signed module: rocm

#5233 opened Dec 16, 2025 by aryaman-gupta

Loading…

support object cache in ssd l2 cache and add more unit tests cla signed fb-exported meta-exported

#5228 opened Dec 16, 2025 by zhaojuanmao

Loading…

Optimizing 4-bit dequant to FP32 on AArch64 using vectorized intrinsics in EmbeddingSpMDMAutovec cla signed

#5224 opened Dec 15, 2025 by marma01

Loading…

Update heuristic to support variant batch sizes cla signed fb-exported meta-exported

#5211 opened Dec 10, 2025 by zjing14

Loading…

Use H100 runners for OSS CI cla signed fb-exported meta-exported

#5205 opened Dec 9, 2025 by q10

Loading…

Modifying clear_all_staged_data to accomadate KV Tensor Deletion cla signed fb-exported meta-exported

#5202 opened Dec 9, 2025 by Raahul46

Loading…

Previous 1 2 3 4 5 … 24 25 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!