- San Jose, CA
-
18:46
(UTC -08:00) - www.tylerosterberg.com
Stars
Architectural Metapatterns book and wiki
Flexible and powerful framework for managing multiple AI agents and handling complex conversations
Literature references for “Designing Data-Intensive Applications”
A curated list of software and architecture related design patterns.
Learn Low Level Design (LLD) and prepare for interviews using free resources.
Curated Data Science resources (Free & Paid) to help aspiring and experienced data scientists learn, grow, and advance their careers.
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
AWS CDK Builder is a browser-based tool designed to streamline bootstrapping of Infrastructure as Code (IaC) projects using the AWS Cloud Development Kit (CDK).
SGLang is a fast serving framework for large language models and vision language models.
tosterberg / djl
Forked from deepjavalibrary/djlAn Engine-Agnostic Deep Learning Framework in Java
tosterberg / djl-demo
Forked from deepjavalibrary/djl-demoDemo applications showcasing DJL
tosterberg / djl-serving
Forked from deepjavalibrary/djl-servingA universal scalable machine learning model deployment solution
Multiplayer top-down shooter made from scratch in C++. Play in your Browser! https://hypersomnia.io Made in 🇵🇱
Procedural tree generator written with JavaScript and Three.js
A library for training and deploying machine learning models on Amazon SageMaker
Tools to Design or Visualize Architecture of Neural Network
aws-neuron / upstreaming-to-vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding