-
Tesla
- Sunnyvale
-
05:22
(UTC -08:00) - https://linkedin.com/zhuangh
- @zhuangh
Stars
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
A playbook for systematically maximizing the performance of deep learning models.
Submanifold sparse convolutional networks
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Fast and memory-efficient exact attention
Roadmap to becoming an Artificial Intelligence Expert in 2022
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
List of Computer Science courses with video lectures.
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
搞定C++:punch:。C++ Primer 中文版第5版学习仓库,包括笔记和课后练习答案。
TAPA compiles task-parallel HLS program into high-performance FPGA accelerators. UCLA-maintained.
by ex-googlers, for ex-googlers - a lookup table of similar tech & services
Development repository for the Triton language and compiler
程序员延寿指南 | A programmer's guide to live longer
FlexASR: A Reconfigurable Hardware Accelerator for Attention-based Seq-to-Seq Networks
Reinforcement learning environments for compiler and program optimization tasks
Brevitas: neural network quantization in PyTorch
Stencil with Optimized Dataflow Architecture Compiler
Training neural models with structured signals.