Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 1,491 296 Updated Oct 16, 2025

ai-compiler-study / triton-kernels

Triton kernels for Flux

Python 22 Updated Jul 7, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,581 247 Updated Oct 25, 2025

BobMcDear / attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 580 30 Updated Aug 12, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,171 2,086 Updated Oct 26, 2025

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 266 47 Updated Oct 27, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,363 2,341 Updated Oct 27, 2025

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,874 368 Updated Oct 26, 2025

tangtangcoding / C-C-

程序员相关电子书资料免费分享，欢迎关注个人微信公众号：编程与实战

4,712 1,186 Updated Apr 4, 2024

ttanzhiqiang / onnx-tensorrt_7.2.1

tensorrt-onnx build window

C++ 5 1 Updated Aug 4, 2021

666DZY666 / micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-…

Python 2,262 478 Updated May 6, 2025

ttanzhiqiang / Detectron2_Project

Detectron2 Libtorch C++ faster rcnn

C++ 13 2 Updated Aug 6, 2021

xingyizhou / CenterNet2

Two-stage CenterNet

Python 1,222 188 Updated Nov 20, 2022

ttanzhiqiang / RLE_blob

RLE(run-length encoding) vs halcon vs opencv

C++ 39 16 Updated Jun 25, 2021

NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter

Python 4,825 694 Updated Aug 17, 2024

ttanzhiqiang / onnx_tensorrt_project

Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt

C++ 210 42 Updated Aug 2, 2021

Navifra-Kerry / SemanticSegmentation-Libtorch

Libtorch Examples

C++ 42 16 Updated Jul 16, 2021

msnh2012 / Msnhnet

🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.

C++ 746 148 Updated Apr 23, 2023

apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,825 6,752 Updated Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ttanzhiqiang

Achievements

Achievements

Block or report ttanzhiqiang

Stars

pranjalssh / fast.cu

tile-ai / tilelang

thu-ml / SLA

deepseek-ai / DeepSeek-V3.2-Exp

Ascend / triton-ascend

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

deepseek-ai / DualPipe

deepseek-ai / FlashMLA

deepseek-ai / DeepSeek-R1

deepseek-ai / DeepSeek-V3

meta-recsys / generative-recommenders