-
lance Public
Forked from lance-format/lanceModern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Rust Apache License 2.0 UpdatedApr 27, 2025 -
ray Public
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Python Apache License 2.0 UpdatedApr 26, 2025 -
lancedb Public
Forked from lancedb/lancedbDeveloper-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!
Rust Apache License 2.0 UpdatedDec 12, 2024 -
data-juicer Public
Forked from datajuicer/data-juicerA one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Python Apache License 2.0 UpdatedDec 7, 2024 -
unitycatalog Public
Forked from unitycatalog/unitycatalogOpen, Multi-modal Catalog for Data & AI
Java Apache License 2.0 UpdatedJun 13, 2024 -
datasets Public
Forked from huggingface/datasets🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Python Apache License 2.0 UpdatedFeb 21, 2024 -
candle Public
Forked from huggingface/candleMinimalist ML framework for Rust
Rust Apache License 2.0 UpdatedJan 21, 2024 -
hf-hub Public
Forked from huggingface/hf-hubRust client for the huggingface hub aiming for minimal subset of features over `huggingface-hub` python package
Rust UpdatedDec 21, 2023 -
dbt-core Public
Forked from dbt-labs/dbt-coredbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Python Apache License 2.0 UpdatedNov 8, 2023 -
incubator-hudi Public
Forked from apache/hudiUpserts And Incremental Processing on Big Data
-
bitsail Public
Forked from bytedance/bitsailBitSail is a distributed, high-performance data integration framework and both support streaming and batch mode. At present, BitSail is mainly designed with the ELT model, which have EB data size a…
-
delta-rs Public
Forked from delta-io/delta-rsA native Rust library for Delta Lake, with bindings into Python
Rust Apache License 2.0 UpdatedMay 18, 2023 -
doris Public
Forked from apache/dorisApache Doris is an easy-to-use, high performance and unified analytics database.
Java Apache License 2.0 UpdatedMay 14, 2023 -
risingwave Public
Forked from risingwavelabs/risingwaveRisingWave: the next-generation streaming database in the cloud.
-
-
flink-cdc-connectors Public
Forked from apache/flink-cdcCDC Connectors for Apache Flink®
Java Apache License 2.0 UpdatedNov 7, 2022 -
debezium Public
Forked from debezium/debeziumChange data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Java Apache License 2.0 UpdatedNov 7, 2022 -
pulsar Public
Forked from apache/pulsarApache Pulsar - distributed pub-sub messaging system
Java Apache License 2.0 UpdatedOct 25, 2022 -
delta Public
Forked from delta-io/deltaThis connector allows Apache Spark™ to read from and write to Delta Lake.
Scala Apache License 2.0 UpdatedJul 21, 2022 -
airbyte Public
Forked from airbytehq/airbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Java Other UpdatedJun 24, 2022 -
arrow Public
Forked from apache/arrowApache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for effic…
C++ Apache License 2.0 UpdatedDec 27, 2020 -
-
hyperspace Public
Forked from microsoft/hyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Scala Apache License 2.0 UpdatedJul 26, 2020 -
presto Public
Forked from prestodb/prestoThe official home of the Presto distributed SQL query engine for big data
Java Apache License 2.0 UpdatedJun 18, 2020 -
spark-cassandra-connector Public
Forked from apache/cassandra-spark-connectorDataStax Spark Cassandra Connector
Scala Apache License 2.0 UpdatedMay 15, 2020 -
-
parquet-mr Public
Forked from apache/parquet-javaApache Parquet
Java Apache License 2.0 UpdatedApr 15, 2020 -
kafka-connect-hdfs Public
Forked from confluentinc/kafka-connect-hdfsKafka Connect HDFS connector
Java Other UpdatedMar 23, 2020 -
pointnet-keras Public
Keras implementation for Pointnet
-
Udacity self-driving car nano degree term1 German traffic sign classifier project