Skip to content
View kevinjqliu's full-sized avatar

Block or report kevinjqliu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Apache Iceberg C++

C++ 139 51 Updated Sep 23, 2025

Code repo for "An Empirical Evaluation of Columnar Storage Formats" VLDB Vol 17

63 9 Updated May 18, 2024

An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Rust 1,753 66 Updated Sep 23, 2025

Azure extension for DuckDB

C++ 66 27 Updated Sep 18, 2025

Build reliable AI and agentic applications with DataFrames

Python 297 18 Updated Sep 23, 2025

The Feldera Incremental Computation Engine

Rust 1,622 79 Updated Sep 23, 2025

Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow

Rust 232 52 Updated Sep 23, 2025

DuckLake is an integrated data lake and catalog format

C++ 2,077 95 Updated Sep 23, 2025

Native Rust TPCH support for Datafusion using tpchgen

Rust 3 3 Updated Jun 8, 2025
Python 156 5 Updated May 21, 2025

Icebird: JavaScript Iceberg Client

JavaScript 43 1 Updated May 8, 2025

Spark integrations for working with Lance datasets

Java 23 15 Updated Sep 23, 2025

Lance Namespace is an open specification on top of the storage-based Lance table and file format to standardize access to a collection of Lance tables

Java 31 11 Updated Sep 23, 2025

TPC-H benchmark data generation in pure Rust

Rust 178 44 Updated Sep 9, 2025

Olympia is a storage-only open catalog format for big data analytics, ML & AI.

Java 14 3 Updated May 5, 2025

Code used to create text embeddings of all Magic: The Gathering cards.

Jupyter Notebook 56 3 Updated Feb 24, 2025

DataFusion TableProviders for reading data from other systems

Rust 146 50 Updated Sep 22, 2025

Apache Iceberg

Rust 1,088 320 Updated Sep 22, 2025

Analytical database for data-driven Web applications 🪶

Rust 496 17 Updated Feb 25, 2025

The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark, Flink and others, when used with the Iceberg Table format

Java 138 22 Updated Aug 14, 2025

Apache Iceberg - Go

Go 329 118 Updated Sep 22, 2025

Batteries included CLI, TUI, and server implementations for DataFusion.

Rust 164 18 Updated Jun 24, 2025

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 20,680 2,953 Updated Sep 23, 2025

Monitoring and insights on your data lakehouse tables

Java 30 9 Updated Sep 22, 2025

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Python 4,169 331 Updated Sep 23, 2025

The simplest, highest-throughput Python interface to S3, GCS & Azure Storage, powered by Rust.

Python 548 24 Updated Sep 19, 2025

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, P…

Python 429 79 Updated Sep 23, 2025
Next