The Library
Coordinated joint multimodal embeddings for generalized audio-visual zero-shot classification and retrieval of videos
Tools
Parida, Kranti K., Matiyali, Niraj, Guha, Tanaya and Sharma, Gaurav (2020) Coordinated joint multimodal embeddings for generalized audio-visual zero-shot classification and retrieval of videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Aspen, Colorado, 1-5 Mar 2020 pp. 3240-3249. ISBN 9781728165530. doi:10.1109/WACV45572.2020.9093438 ISSN 2642-9381.
|
PDF
WRAP-Coordinated-joint-multimodal-embeddings-videos-Guha-2019.pdf - Accepted Version - Requires a PDF viewer. Download (9Mb) | Preview |
Official URL: http://dx.doi.org/10.1109/WACV45572.2020.9093438
Abstract
We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to images. We demonstrate that both audio and visual modalities are important for ZSL for videos. Since a dataset to study the task is currently not available, we also construct an appropriate multimodal dataset with 33 classes containing 156,416 videos, from an existing large scale audio event dataset. We empirically show that the performance improves by adding audio modality for both tasks of zeroshot classification and retrieval, when using multimodal extensions of embedding learning methods. We also propose a novel method to predict the `dominant' modality using a jointly learned modality attention network. We learn the attention in a semi-supervised setting and thus do not require any additional explicit labelling for the modalities. We provide qualitative validation of the modality specific attention, which also successfully generalizes to unseen test classes.
Item Type: | Conference Item (Paper) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > Q Science (General) T Technology > TA Engineering (General). Civil engineering (General) T Technology > TK Electrical engineering. Electronics Nuclear engineering |
|||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | |||||||||
Library of Congress Subject Headings (LCSH): | Machine learning, Digital video -- Standards, Computer vision, Pattern recognition systems, Computer graphics, Visual communication | |||||||||
Publisher: | IEEE | |||||||||
ISBN: | 9781728165530 | |||||||||
ISSN: | 2642-9381 | |||||||||
Official Date: | 14 May 2020 | |||||||||
Dates: |
|
|||||||||
Page Range: | pp. 3240-3249 | |||||||||
DOI: | 10.1109/WACV45572.2020.9093438 | |||||||||
Status: | Peer Reviewed | |||||||||
Publication Status: | Published | |||||||||
Access rights to Published version: | Restricted or Subscription Access | |||||||||
Date of first compliant deposit: | 21 November 2019 | |||||||||
Date of first compliant Open Access: | 16 February 2021 | |||||||||
RIOXX Funder/Project Grant: |
|
|||||||||
Conference Paper Type: | Paper | |||||||||
Title of Event: | IEEE Winter Conference on Applications of Computer Vision (WACV) | |||||||||
Type of Event: | Conference | |||||||||
Location of Event: | Aspen, Colorado | |||||||||
Date(s) of Event: | 1-5 Mar 2020 | |||||||||
Related URLs: | ||||||||||
Open Access Version: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year