
The Library
Mix-ViT : mixing attentive vision transformer for ultra-fine-grained visual categorization
Tools
Yu, Xiaohan, Wang, Jun, Zhao, Yang and Gao, Yongsheng (2023) Mix-ViT : mixing attentive vision transformer for ultra-fine-grained visual categorization. Pattern Recognition, 135 . 109131. doi:10.1016/j.patcog.2022.109131 ISSN 0031-3203.
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: http://dx.doi.org/10.1016/j.patcog.2022.109131
Abstract
Ultra-fine-grained visual categorization (ultra-FGVC) moves down the taxonomy level to classify sub-granularity categories of fine-grained objects. This inevitably poses a challenge, i.e., classifying highly similar objects with limited samples, which impedes the performance of recent advanced vision transformer methods. To that end, this paper introduces Mix-ViT, a novel mixing attentive vision transformer to address the above challenge towards improved ultra-FGVC. The core design is a self-supervised module that mixes the high-level sample tokens and learns to predict whether a token has been substituted after attentively substituting tokens. This drives the model to understand the contextual discriminative details among inter-class samples. Via incorporating such a self-supervised module, the network gains more knowledge from the intrinsic structure of input data and thus improves generalization capability with limited training sample. The proposed Mix-ViT achieves competitive performance on seven publicly available datasets, demonstrating the potential of vision transformer compared to CNN for the first time in addressing the challenging ultra-FGVC tasks. The code is available at https://github.com/Markin-Wang/MixViT
Item Type: | Journal Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||
Journal or Publication Title: | Pattern Recognition | ||||||||
Publisher: | Pergamon | ||||||||
ISSN: | 0031-3203 | ||||||||
Official Date: | March 2023 | ||||||||
Dates: |
|
||||||||
Volume: | 135 | ||||||||
Article Number: | 109131 | ||||||||
DOI: | 10.1016/j.patcog.2022.109131 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Open Access (Creative Commons) |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |