Stars
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A beautiful config generator for Ghostty terminal.
An extremely fast Python linter and code formatter, written in Rust.
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
🐶 Kubernetes CLI To Manage Your Clusters In Style!
A cloud native embedded storage engine built on object storage.
A curated list to learn about distributed systems
A simple SHA-512, SHA-384, SHA-512/224, SHA-512/256 hash functions for JavaScript supports UTF-8 encoding.
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
A collection of learning resources for curious software engineers
A jq clone focussed on correctness, speed, and simplicity
JVector: the most advanced embedded vector search engine
Generic command line non-JVM Apache Kafka producer and consumer
A collection of inspiring resources related to engineering management and tech leadership
open source training courses about distributed database and distributed systems
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
A developer toolkit to implement Serverless best practices and increase developer velocity.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
A high performance caching library for Java
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
🎨 Diagram as Code for prototyping cloud system architectures
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.