- Beijing, China
-
pg_lake Public
Forked from Snowflake-Labs/pg_lakepg_lake: Postgres with Iceberg and data lake access
C Apache License 2.0 UpdatedNov 6, 2025 -
Datus-agent Public
Forked from Datus-ai/Datus-agentThe Future of Data Engineering — A CLI SQL client for the modern data stack, enabling AI-native context engineering for data.
Python Other UpdatedOct 21, 2025 -
automq Public
Forked from AutoMQ/automqAutoMQ is a diskless Kafka on S3. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. Multi-AZ Availability.
Java Apache License 2.0 UpdatedAug 12, 2025 -
foyer Public
Forked from foyer-rs/foyerHybrid in-memory and disk cache in Rust
Rust Apache License 2.0 UpdatedJul 30, 2025 -
Daft Public
Forked from Eventual-Inc/DaftDistributed query engine providing simple and reliable data processing for any modality and scale
Rust Apache License 2.0 UpdatedJun 27, 2025 -
mem0 Public
Forked from mem0ai/mem0Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.
Python Apache License 2.0 UpdatedMay 30, 2025 -
datafusion-ray Public
Forked from apache/datafusion-rayApache DataFusion Ray
Rust Apache License 2.0 UpdatedOct 19, 2024 -
datachain Public
Forked from datachain-ai/datachainDataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
-
omniparse Public
Forked from adithya-s-k/omniparseIngest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Python GNU General Public License v3.0 UpdatedJul 5, 2024 -
graphrag Public
Forked from microsoft/graphragA modular graph-based Retrieval-Augmented Generation (RAG) system
Python MIT License UpdatedJul 4, 2024 -
unitycatalog Public
Forked from unitycatalog/unitycatalogOpen, Multi-modal Catalog for Data & AI
Java Apache License 2.0 UpdatedJun 15, 2024 -
Polycat Public
Forked from DataCakeCloud/PolycatPolycat is a cutting-edge cloud-native metastore system, purpose-built to cater to the demands of modern data management in lakehouse deployments. It offers a comprehensive solution for organizatio…
Java Apache License 2.0 UpdatedApr 12, 2024 -
MediaCrawler Public
Forked from NanmiCoder/MediaCrawler小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫
Python Apache License 2.0 UpdatedMar 17, 2024 -
nanoGPT Public
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
-
magika Public
Forked from google/magikaDetect file content types with deep learning
Python Apache License 2.0 UpdatedFeb 19, 2024 -
OpenLineage Public
Forked from OpenLineage/OpenLineageAn Open Standard for lineage metadata collection
Java Apache License 2.0 UpdatedJan 19, 2024 -
llmperf Public
Forked from ray-project/llmperfLLMPerf is a library for validating and benchmarking LLMs
Python Apache License 2.0 UpdatedDec 22, 2023 -
proton Public
Forked from timeplus-io/protonA unified streaming and historical data processing engine in one single binary, powered by ClickHouse
C++ Apache License 2.0 UpdatedNov 3, 2023 -
Jungle Public
Forked from eBay/JungleAn embedded key-value store library specialized for building state machine and log store
C++ Apache License 2.0 UpdatedSep 18, 2023 -
llama2_aided_tesseract Public
Forked from Dicklesworthstone/llm_aided_ocrEnhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections, complete with options for text validation and hallucination filtering.
Python UpdatedAug 2, 2023 -
ClickBench Public
Forked from ClickHouse/ClickBenchClickBench: a Benchmark For Analytical Databases
HTML Other UpdatedFeb 19, 2023 -
neon Public
Forked from neondatabase/neonNeon: Serverless Postgres. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
Rust Apache License 2.0 UpdatedJul 21, 2022 -
alpa Public
Forked from alpa-projects/alpaAuto parallelization for large-scale neural networks
Python Apache License 2.0 UpdatedJul 1, 2022 -
tigris Public
Forked from tigrisdata-archive/tigrisTigris is a modern, scalable backend for building real-time websites and apps.
-
system-design-resources Public
Forked from InterviewReady/system-design-resourcesThese are the best resources for System Design on the Internet
GNU General Public License v3.0 UpdatedMay 30, 2022 -
timely-dataflow Public
Forked from TimelyDataflow/timely-dataflowA modular implementation of timely dataflow in Rust
Rust MIT License UpdatedMay 20, 2022 -
diagrams Public
Forked from mingrammer/diagrams🎨 Diagram as Code for prototyping cloud system architectures
Python MIT License UpdatedApr 29, 2022 -
antlr4 Public
Forked from antlr/antlr4ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
Java Other UpdatedMar 26, 2022 -
modin Public
Forked from modin-project/modinModin: Speed up your Pandas workflows by changing a single line of code
Python Apache License 2.0 UpdatedMar 10, 2022 -
lux Public
Forked from lux-org/luxAutomatically visualize your pandas dataframe via a single print! 📊 💡
Python Apache License 2.0 UpdatedMar 8, 2022