Stars
Optimizing inference proxy for LLMs
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
A list of startups that have employee-friendly terms for exercising your options past 90 days.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Development repository for the Triton language and compiler
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Demo of running NNs across different frameworks
The convertor/conversion of deep learning models for different deep learning frameworks/softwares.
Original Python version of Intel® Nervana™ Graph