Starred repositories
An Apache Iceberg REST Catalog explorer - view namespaces, tables, stats, metadata, schema evolution, and more.
pg_lake: Postgres with Iceberg and data lake access
claude-code generated parquet metadata vizualizer that runs in your browser
dbc is a command-line tool for installing and managing ADBC drivers
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamlessly with Spark SQL, allowing you to …
WIP (out of tree) Rust implementation of TPC-DS generators.
[SIGMOD 2026] F3: The Open-Source Data File Format for the Future
Chronon is a data platform for serving for AI/ML applications.
[VLDB 2023 Vol 17] "An Empirical Evaluation of Columnar Storage Formats"
An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
Build reliable AI and agentic applications with DataFrames
Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow
DuckLake is an integrated data lake and catalog format
Native Rust TPCH support for Datafusion using tpchgen
Spark integrations for working with Lance datasets
Lance Namespace is an open specification on top of the storage-based Lance table and file format to standardize access to a collection of Lance tables
TPC-H benchmark data generation in pure Rust
Olympia is a storage-only open catalog format for big data analytics, ML & AI.
Code used to create text embeddings of all Magic: The Gathering cards.
DataFusion TableProviders for reading data from other systems