Lists (1)
Sort Name ascending (A-Z)
Stars
High-performance distributed multi-level cache system. Built by Rust.
A cloud native embedded storage engine built on object storage.
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Spark integrations for working with Lance datasets
Integration between Lance and Ray for distributed data processing
Build reliable AI and agentic applications with DataFrames
The observability platform for Iceberg lakehouses.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A lightweight data processing framework built on DuckDB and 3FS.
Perforator is a cluster-wide continuous profiling tool designed for large data centers
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
antgroup / ant-ray
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay is forked from ray, offering incremental new features on top …
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A collection of RBIR projects and posts for anyone interested in joining this journey.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Eclipse Theia is a cloud & desktop IDE framework implemented in TypeScript.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
New file format for storage of large columnar datasets.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Alluxio, data orchestration for analytics and machine learning in the cloud
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Apache Doris is an easy-to-use, high performance and unified analytics database.