Stars
Self-Attention based on Fourier Frequency Domain Filter Network for Visual Question Answering
We proposed a Multiple-Step Question-Driven VQA (MQVQA) system to improve the reasoning and understanding ability in remote sensing VQA tasks in cases where questions focus on not only image scenes…
Modality Perception Learning based Determinative Factor Discovery model
A paper list of some recent Mamba-based CV works.
BiomedCLIP data pipeline
2025年全网最全即插即用模块,免费分享!CVPR2025,AAAI2025,ICLR2025,TNNLS2025,arXiv2025......包含人工智能全领域(机器学习、深度学习等),适用于图像分类、目标检测、实例分割、语义分割、全景分割、姿态识别、医学图像分割、视频目标分割、图像抠图、图像编辑、单目标跟踪、多目标跟踪、行人重识别、RGBT、图像去噪、去雨、去雾、去阴影、去模糊、超分辨…
多模态情感分析——基于BERT+ResNet的多种融合方法
[ICASSP 2025] Official PyTorch code for training and inference pipeline for DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
Multimodal Residual Learning for Visual QA (NIPS 2016)
[PRCV-2023, IEEE TMM-2025] Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification
Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)
MMBERT: Multimodal BERT Pretraining for Improved Medical VQA
Latex code for making neural networks diagrams
MM-IDTarget: a novel deep learning framework for identifying targets using cross-attention based multimodal fusion strategy
code and trained models for "Attentional Feature Fusion"
Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.
Multimodal Fusion with Co-Attention Networks for Fake News Detection
This is the reproduction of MCAN from paper in ACL 2021: "Multimodal Fusion with Co-Attention Networks for Fake News Detection"
Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement
Deep Modular Co-Attention Networks for Visual Question Answering