yichuan-w

Follow

Yichuan Wang yichuan-w

Follow

EECS PhD SkyLab@UC Berkeley, Undergraduate ACM Class SJTU

245 followers · 260 following

https://yichuan-w.github.io/

Achievements

Achievements

Highlights

Pro

Lists (1)

Sort

🚀 My stack

Stars

weaviate / query-agent-benchmarking

Tools for various benchmarking scenarios of the Weaviate Query Agent

Jupyter Notebook 6 1 Updated Sep 24, 2025

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

C++ 4,959 594 Updated Oct 15, 2025

rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU

Cuda 541 133 Updated Oct 14, 2025

google-deepmind / loft

LOFT: A 1 Million+ Token Long-Context Benchmark

Python 218 17 Updated Jun 13, 2025

StarlightSearch / EmbedAnything

Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀

Rust 740 67 Updated Oct 5, 2025

fastapi / fastapi

FastAPI framework, high performance, easy to learn, fast to code, ready for production

Python 90,731 8,041 Updated Oct 13, 2025

svg-project / flash-kmeans

Fast and memory-efficient exact kmeans

Python 105 6 Updated Sep 30, 2025

MinorJerry / WebVoyager

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 933 102 Updated Mar 4, 2024

xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Python 2,015 156 Updated Jan 15, 2025

lightonai / fast-plaid

High-Performance Engine for Multi-Vector Search

Rust 170 10 Updated Oct 7, 2025

trotsky1997 / OpenTinker

Tinker, but open-source one

2 Updated Oct 8, 2025

lennart-finke / gpt-oss

What does gpt-oss tell us about OpenAI's training data?

Python 25 2 Updated Sep 19, 2025

az1326 / advisor-models

How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

Python 32 Updated Oct 6, 2025

lightonai / pylate

Late Interaction Models Training & Retrieval

Python 619 47 Updated Oct 14, 2025

illuin-tech / vidore-benchmark

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

Python 244 32 Updated Aug 4, 2025

haon-chen / mmE5

Python 52 1 Updated Feb 27, 2025

sjtu-zhao-lab / ParaStep

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism (NIPS'25)

Python 12 Updated Oct 6, 2025

tonywu71 / colpali-cookbooks

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻‍🍳

336 27 Updated Jun 2, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 15,953 1,199 Updated Oct 11, 2025

sentient-agi / OpenDeepSearch

SOTA search powered LLM

Python 3,686 341 Updated Apr 4, 2025

liujch1998 / infini-gram

Python 71 11 Updated Aug 7, 2025

google-deepmind / xtr

XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Jupyter Notebook 58 3 Updated Jun 20, 2024

AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Python 3,705 255 Updated May 17, 2025

google / langextract

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

Python 16,372 1,134 Updated Oct 4, 2025

google-deepmind / limit

On the Theoretical Limitations of Embedding-Based Retrieval

Jupyter Notebook 578 44 Updated Sep 15, 2025

VectifyAI / PageIndex

📄🧠 PageIndex: Document Index for Reasoning-based RAG

Python 2,764 206 Updated Oct 14, 2025

jlscheerer / xtr-warp

XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.

Python 167 13 Updated May 3, 2025

stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Python 3,655 455 Updated Oct 14, 2025

velocitybolt / open-extract

Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.

Python 179 21 Updated Mar 29, 2025

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2,248 205 Updated Oct 6, 2025