Stars
Kimi K2 is the large language model series developed by Moonshot AI team
Megvii FILE Library - Working with Files in Python same as the standard library
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
open-source coding LLM for software engineering tasks
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
R1-onevision, a visual language model capable of deep CoT reasoning.
[ICLR'25] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
✨✨Latest Advances on Multimodal Large Language Models
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
Recent LLM-based CV and related works. Welcome to comment/contribute!
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
An open-source tool-augmented conversational language model from Fudan University
Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Large-scale text-video dataset. 10 million captioned short videos.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[ICLR'22] Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks