- Shanghai, China
-
18:33
(UTC +08:00)
Highlights
- Pro
Stars
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
A lightweight library for the RaBitQ algorithm and its applications in vector search.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
A lightweight data processing framework built on DuckDB and 3FS.
[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
A complement to pgvector for high performance, cost efficient vector search on large workloads.
Official software repository of S. Bruch, F. M. Nardini, C. Rulli, and R. Venturini. "Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations." Long Paper @ ACM SIG…
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Source code for the X Recommendation Algorithm
An open-source C++ library developed and used at Facebook.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Apache Doris is an easy-to-use, high performance and unified analytics database.
Web-scale retrieval for knowledge-intensive NLP
PISA: Performant Indexes and Search for Academia
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.