-
Nanjing University
-
00:03
(UTC +08:00)
Highlights
- Pro
Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
Training and serving large-scale neural networks with auto parallelization.
PyTorch extensions for high performance and large scale training.
Making large AI models cheaper, faster and more accessible
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
LingoDB: A new analytical database system that blurs the lines between databases and compilers.
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
A retargetable MLIR-based machine learning compiler and runtime toolkit.
A lightweight memory allocator for hardware-accelerated machine learning
Keep your bugs contained. A platform for studying historical software bugs.
“Debian 小药盒”,一个用来包装 Debian 安装介质的盒子设计和介绍用的说明书。
A massively parallel, optimal functional runtime in Rust
💥💻💥 A data-parallel functional programming language
A massively parallel, high-level programming language
Training materials provided by OpenACC.org.
C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Tile primitives for speedy kernels
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
A list of awesome compiler projects and papers for tensor computation and deep learning.