- shanghai
Stars
ROCm / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
wyzero / tensorflow
Forked from tensorflow/tensorflowAn Open Source Machine Learning Framework for Everyone
An industrial deep learning framework for high-dimension sparse data
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
A tool which profiles OpenCL devices to find their peak capacities
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Tensors and Dynamic neural networks in Python with strong GPU acceleration
NVIDIA / caffe
Forked from BVLC/caffeCaffe: a fast open framework for deep learning.
Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends