The Library
SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval
Tools
Ji, Zhong, Wang, Haoran, Han, Jungong and Pang, Yanwei (2022) SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval. IEEE Transactions on Cybernetics, 52 (2). pp. 1086-1097. doi:10.1109/TCYB.2020.2985716 ISSN 2168-2267.
|
PDF
WRAP-SMAN-Stacked-Multi-Modal-Attention-Network-cross-modal-image-text-retrieval-Han-2020.pdf - Accepted Version - Requires a PDF viewer. Download (4Mb) | Preview |
Official URL: https://doi.org/10.1109/TCYB.2020.2985716
Abstract
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods.
Item Type: | Journal Article | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software T Technology > TA Engineering (General). Civil engineering (General) T Technology > TK Electrical engineering. Electronics Nuclear engineering |
||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Engineering > WMG (Formerly the Warwick Manufacturing Group) | ||||||||||||
Library of Congress Subject Headings (LCSH): | Information visualization, Image processing -- Digital techniques, Closed captioning -- Technological innovations, Content-based image retrieval, Pattern recognition systems , Computer vision , Multimodal user interfaces (Computer systems) , Keyword searching -- Technological innovations, Natural language processing (Computer science) | ||||||||||||
Journal or Publication Title: | IEEE Transactions on Cybernetics | ||||||||||||
Publisher: | IEEE Computer Society | ||||||||||||
ISSN: | 2168-2267 | ||||||||||||
Official Date: | February 2022 | ||||||||||||
Dates: |
|
||||||||||||
Volume: | 52 | ||||||||||||
Number: | 2 | ||||||||||||
Page Range: | pp. 1086-1097 | ||||||||||||
DOI: | 10.1109/TCYB.2020.2985716 | ||||||||||||
Status: | Peer Reviewed | ||||||||||||
Publication Status: | Published | ||||||||||||
Reuse Statement (publisher, data, author rights): | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | ||||||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||||||
Date of first compliant deposit: | 7 April 2020 | ||||||||||||
Date of first compliant Open Access: | 7 April 2020 | ||||||||||||
RIOXX Funder/Project Grant: |
|
||||||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year