Skip to content

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Notifications You must be signed in to change notification settings

fyting/ViT-CoMer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

The official implementation of the CVPR2024 paper "Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions"

Paper | ViT-CoMer知乎解读 | 检测排名图paperwithcode |分割排名图paperwithcode|

Highlights

  • We propose a novel dense prediction backbone by combining the plain ViT with CNN features. It effectively leverages various open-source pre-trained ViT weights and incorporates spatial pyramid convolutional features that address the lack of interaction among local ViT features and the challenge of single-scale representation.
  • ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val.

Introduction

We present a plain, pre-training-free, and feature-enhanced ViT backbone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. 在这里插入图片描述

About

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published