- NewYork
Stars
An open-source AI agent that brings the power of Gemini directly into your terminal.
🔥🔥🔥 ICLR 2025 Oral. Automating Agentic Workflow Generation.
[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
F1: A Vision Language Action Model Bridging Understanding and Generation to Actions
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
💫 Toolkit to help you get started with Spec-Driven Development
LLM agents built for control. Designed for real-world use. Deployed in minutes.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
Official PyTorch implementation for "Large Language Diffusion Models"
About Awesome things towards foundation agents. Papers / Repos / Blogs / ...
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Notebooks using the Hugging Face libraries 🤗
This is the code related to "🔥Effective Training Data Synthesis for Improving MLLM Chart Understanding" (ICCV 2025).
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬