Stars
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
A version of verl to support diverse tool use
A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API
Towards a Unified View of Large Language Model Post-Training
A Collection of Papers about Memory for Language Agents
A Survey of Reinforcement Learning for Large Reasoning Models
A simple yet powerful agent framework that delivers with open-source models
Google Research
An Open Source implementation of Notebook LM with more flexibility and features
Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021
DataComp: In search of the next generation of multimodal datasets
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Overview of pipelines related to PDF to Markdown document processing.
A synthetic data generator for text recognition
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.
Everything about note management. All in Zotero.
OCR, layout analysis, reading order, table recognition in 90+ languages
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…
《Designing Data-Intensive Application》DDIA 第一版 / 第二版 中文翻译