Lists (4)
Sort Name ascending (A-Z)
Stars
Python implementation of Text-Image-Augmentation
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high eff…
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte…
CloudMoe Windows 10/11 Activation Toolkit get digital license, the best open source Win 10/11 activator in GitHub. GitHub 上最棒的开源 Win10/Win11 数字权利(数字许可证)激活工具!
ObjectClear: Complete Object Removal via Object-Effect Attention
[ICCV 2025] FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
Open-Sora: Democratizing Efficient Video Production for All
Official code for ICCV 2025 paper, X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Wan: Open and Advanced Large-Scale Video Generative Models
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Fully open reproduction of DeepSeek-R1
[CVPR 2025] Official implementation of the paper "SmartEraser: Remove Anything from Images using Masked-Region Guidance".
[CVPR2025] RORem: Training a Robust Object Remover with Human-in-the-Loop
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Official implementation of the paper "Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance" (AAAI 2025 Oral)
[under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
A curated list of papers, code, and resources pertaining to generative image composition or object insertion.