-
LinkedIn
- Bay Area
- https://khosravipasha.github.io
- @pashakho
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Supporting PyTorch models with the Google AI Edge TFLite runtime.
Running any GGUF SLMs/LLMs locally, on-device in Android
Run SD1.x/2.x/3.x, SDXL, and FLUX.1 on your phone device
On-device AI across mobile, embedded and edge for PyTorch
LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization
Artificial Neural Engine Machine Learning Library
Cross-platform, customizable ML solutions for live and streaming media.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
Universal LLM Deployment Engine with ML Compilation
A MLX port of FLUX and other state of the art diffusion image models based on the Huggingface Diffusers implementation.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
🤗 smolagents: a barebones library for agents that think in code.
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A debugging and profiling tool that can trace and visualize python code execution
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
SGLang is a fast serving framework for large language models and vision language models.
llama.cpp fork with additional SOTA quants and improved performance
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks