Starred repositories
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Embedding model prioritized towards Multimodal RAG, overall + VisDoc double top1 on MMEB benchmark
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling sa…
Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
New generation of CLIP with fine grained discrimination capability, ICML2025
A PyTorch implementation of SimCLR based on ICML 2020 paper "A Simple Framework for Contrastive Learning of Visual Representations"
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
Retrieval and Retrieval-augmented LLMs
This is the official implementation of our paper: "MiniMax-Remover: Taming Bad Noise Helps Video Object Removal"
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
verl: Volcano Engine Reinforcement Learning for LLMs
Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"