-
DeepSeek AI, Peking University
- Beijing
- charlesCXK.github.io
Highlights
- Pro
Stars
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
This package contains the original 2012 AlexNet code.
OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI researchers globally.
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Janus-Series: Unified Multimodal Understanding and Generation Models
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
This repo contains the code for 1D tokenizer and generator
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Schedule-Free Optimization in PyTorch
Annotated version of the Mamba paper
VideoSys: An easy and efficient system for video generation
A curated list of recent diffusion models for video generation, editing, and various other applications.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
Recent LLM-based CV and related works. Welcome to comment/contribute!
Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"
✨✨Latest Advances on Multimodal Large Language Models
The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams