Stars
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
A powerful tool for creating fine-tuning datasets for LLM
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Recommend new arxiv papers of your interest daily according to your Zotero libarary.
[NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
[ACM MM24] MotionMaster: Training-free Camera Motion Transfer For Video Generation
In 2024, the strongest open-source implementation of asymmetric magvit_v2 supports inference code but excludes VQVAE. It supports the joint encoding of images and videos, accommodating arbitrary vi…
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Code for the RSS 2023 paper "Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement"
code for "MVOC:atraining-free multiple video object composition method with diffusion models"
Code for FreeTraj, a tuning-free method for trajectory-controllable video generation
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Pytorch official implementation for our paper "HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation".
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
code for UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene
Public code release for: ColorfulCurves: Palette-Aware Lightness Control and Color Editing via Sparse Optimization (SIGGRAPH 2023) [Ted Chao, Jason Klein, Jianchao Tan, Jose Echevarria, Yotam Gingold]
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"