-
Tencent
- Shanghai
-
21:30
(UTC +08:00) - https://jiangzhengkai.github.io/
- @jiang_zhengkai
Highlights
- Pro
Stars
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
LongLive: Real-time Interactive Long Video Generation
My learning notes/codes for ML SYS.
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
PyTorch distributed training acceleration framework
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
High performance inference engine for diffusion models
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Official GitHub repository for FLUX.1 Krea [dev].
Lumos Project: Frontier video unified model research by Alibaba DAMO Academy.
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation