Skip to content
View MMMMMMolly's full-sized avatar

Block or report MMMMMMolly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Self-Attention based on Fourier Frequency Domain Filter Network for Visual Question Answering

Python 2 Updated Feb 22, 2025
5 1 Updated Jul 14, 2024

We proposed a Multiple-Step Question-Driven VQA (MQVQA) system to improve the reasoning and understanding ability in remote sensing VQA tasks in cases where questions focus on not only image scenes…

Python 9 1 Updated Dec 21, 2022
Python 27 3 Updated Nov 27, 2025

Modality Perception Learning based Determinative Factor Discovery model

Python 3 Updated Feb 27, 2024

A paper list of some recent Mamba-based CV works.

426 22 Updated Nov 10, 2025

BiomedCLIP data pipeline

Jupyter Notebook 93 11 Updated Jan 14, 2025

2025年全网最全即插即用模块,免费分享!CVPR2025,AAAI2025,ICLR2025,TNNLS2025,arXiv2025......包含人工智能全领域(机器学习、深度学习等),适用于图像分类、目标检测、实例分割、语义分割、全景分割、姿态识别、医学图像分割、视频目标分割、图像抠图、图像编辑、单目标跟踪、多目标跟踪、行人重识别、RGBT、图像去噪、去雨、去雾、去阴影、去模糊、超分辨…

Python 1,297 101 Updated May 24, 2025

多模态情感分析——基于BERT+ResNet的多种融合方法

Python 332 30 Updated Nov 20, 2022
Python 18 Updated Mar 8, 2023

[ICASSP 2025] Official PyTorch code for training and inference pipeline for DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection

Python 82 8 Updated Mar 11, 2025

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

Python 965 162 Updated Oct 22, 2022

The implementation of CLVIN、CAAN and MPCCT

Python 8 Updated Nov 27, 2024

Multimodal Residual Learning for Visual QA (NIPS 2016)

Lua 38 5 Updated Dec 27, 2016

[PRCV-2023, IEEE TMM-2025] Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification

Python 12 1 Updated Jun 3, 2025

not yet

Python 7 5 Updated Dec 5, 2019

Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)

Jupyter Notebook 37 14 Updated Jan 20, 2022

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Python 38 8 Updated Mar 22, 2021

Latex code for making neural networks diagrams

TeX 24,144 3,025 Updated Aug 21, 2023

MM-IDTarget: a novel deep learning framework for identifying targets using cross-attention based multimodal fusion strategy

Python 3 Updated Apr 6, 2025

code and trained models for "Attentional Feature Fusion"

Python 802 102 Updated Jul 23, 2021

Source code of SFusion

Python 28 2 Updated Mar 5, 2023

Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.

Python 120 10 Updated Jun 19, 2023

Multimodal Fusion with Co-Attention Networks for Fake News Detection

Python 4 Updated Jul 9, 2024

This is the reproduction of MCAN from paper in ACL 2021: "Multimodal Fusion with Co-Attention Networks for Fake News Detection"

Python 49 4 Updated Dec 25, 2023

Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement

Python 29 3 Updated Oct 3, 2023

Deep Modular Co-Attention Networks for Visual Question Answering

Python 456 89 Updated Dec 16, 2020
Next