Skip to content
View charlesCXK's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@HRNet @Atten4Vis

Block or report charlesCXK

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,278 58 Updated Nov 16, 2025

This package contains the original 2012 AlexNet code.

Cuda 2,779 360 Updated Mar 12, 2025

OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI researchers globally.

Python 112 4 Updated May 22, 2025

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Rust 10,599 731 Updated Nov 27, 2025

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

Python 1,443 102 Updated Nov 20, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 5,126 1,811 Updated Feb 26, 2025

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,386 67 Updated Aug 4, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,621 2,235 Updated Feb 1, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,424 561 Updated Nov 26, 2025

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 1,075 57 Updated Mar 20, 2025

LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs

Python 394 20 Updated Nov 26, 2025

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,969 532 Updated Sep 25, 2024

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,320 438 Updated Nov 25, 2025

Schedule-Free Optimization in PyTorch

Python 2,236 68 Updated May 21, 2025

Annotated version of the Mamba paper

Jupyter Notebook 491 19 Updated Feb 27, 2024

VideoSys: An easy and efficient system for video generation

Python 2,009 132 Updated Aug 27, 2025

A curated list of recent diffusion models for video generation, editing, and various other applications.

5,219 320 Updated Oct 15, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,732 4,007 Updated Nov 27, 2025

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,588 233 Updated Jun 14, 2024

[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.

Python 1,975 133 Updated Aug 20, 2024

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Python 3,144 272 Updated Jan 10, 2025

Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,330 54 Updated Oct 15, 2025

Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)

Python 1,934 138 Updated Oct 23, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,278 4,774 Updated Jun 2, 2025

Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"

Jupyter Notebook 1,084 50 Updated May 24, 2025

Recent LLM-based CV and related works. Welcome to comment/contribute!

873 38 Updated Mar 8, 2025

Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"

Python 50 5 Updated Nov 16, 2023

✨✨Latest Advances on Multimodal Large Language Models

16,778 1,080 Updated Nov 12, 2025

The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams

13,887 776 Updated Jul 30, 2025
Next