- Seattle, WA, USA
- http://soldaini.net/
- https://orcid.org/0000-0001-6998-9863
- @soldni
- @soldaini.net
Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
Our library for RL environments + evals
📚 Freely available programming books
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
PyTorch building blocks for the OLMo ecosystem
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
Curated list of datasets and tools for post-training.
Versatile typeface for code, from code.
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
😸 Soothing pastel theme for the high-spirited!
A curated list of resources and examples of ASCII Art
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Toolkit for linearizing PDFs for LLM datasets/training
LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.
Large Language Model (LLM) module for the Spezi Ecosystem
BPE modification that implements removing of the intermediate tokens during tokenizer training.
A curated list of awesome model based RL resources (continually updated)
Dockerized iCloud Client - make a local copy of your iCloud documents and photos, and keep it automatically up-to-date.
Tools for shrinking fastText models (in gensim format)