Skip to content
View kevinjqliu's full-sized avatar

Block or report kevinjqliu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[SIGMOD 2026] F3: The Open-Source Data File Format for the Future

Rust 58 1 Updated Oct 1, 2025

Chronon is a data platform for serving for AI/ML applications.

Scala 920 80 Updated Oct 2, 2025

Apache Iceberg C++

C++ 145 55 Updated Sep 30, 2025

Code repo for "An Empirical Evaluation of Columnar Storage Formats" VLDB Vol 17

64 9 Updated May 18, 2024

An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Rust 1,772 67 Updated Oct 2, 2025

Azure extension for DuckDB

C++ 66 27 Updated Sep 26, 2025

Build reliable AI and agentic applications with DataFrames

Python 315 19 Updated Oct 2, 2025

The Feldera Incremental Computation Engine

Rust 1,628 79 Updated Oct 2, 2025

Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow

Rust 236 53 Updated Oct 1, 2025

DuckLake is an integrated data lake and catalog format

C++ 2,100 97 Updated Oct 1, 2025

Native Rust TPCH support for Datafusion using tpchgen

Rust 3 3 Updated Jun 8, 2025
Python 156 5 Updated May 21, 2025

Icebird: JavaScript Iceberg Client

JavaScript 44 1 Updated May 8, 2025

Spark integrations for working with Lance datasets

Java 24 16 Updated Sep 23, 2025

Lance Namespace is an open specification on top of the storage-based Lance table and file format to standardize access to a collection of Lance tables

Java 32 12 Updated Oct 1, 2025

TPC-H benchmark data generation in pure Rust

Rust 183 44 Updated Sep 9, 2025

Olympia is a storage-only open catalog format for big data analytics, ML & AI.

Java 14 3 Updated May 5, 2025

Code used to create text embeddings of all Magic: The Gathering cards.

Jupyter Notebook 56 3 Updated Feb 24, 2025

DataFusion TableProviders for reading data from other systems

Rust 149 51 Updated Sep 29, 2025

Apache Iceberg

Rust 1,093 326 Updated Sep 29, 2025

Analytical database for data-driven Web applications 🪶

Rust 495 17 Updated Feb 25, 2025

The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark, Flink and others, when used with the Iceberg Table format

Java 139 22 Updated Aug 14, 2025

Apache Iceberg - Go

Go 331 119 Updated Oct 1, 2025

Batteries included CLI, TUI, and server implementations for DataFusion.

Rust 165 18 Updated Jun 24, 2025

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 20,711 2,966 Updated Oct 1, 2025

Monitoring and insights on your data lakehouse tables

Java 30 9 Updated Sep 22, 2025

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Python 4,211 335 Updated Oct 2, 2025
Next