The Library

Dense-CaptionNet : a sentence generation architecture for fine-grained description of image semantics

Tools

Khurram, I., Fraz, Muhammad Moazam, Shahzad, M. and Rajpoot, Nasir M. (Nasir Mahmood) (2021) Dense-CaptionNet : a sentence generation architecture for fine-grained description of image semantics. Cognitive Computation, 13 . pp. 595-611. doi:10.1007/s12559-019-09697-1 ISSN 1866-9956.

Preview

PDF
WRAP-Dense-CaptionNet-sentence-generation-architecture-Rajpoot-2019.pdf - Accepted Version - Requires a PDF viewer.
Download (2739Kb) | Preview

Official URL: https://doi.org/10.1007/s12559-019-09697-1

Request Changes to record.

Abstract

Automatic image captioning, a highly challenging research problem, aims to understand and describe the contents of the complex scene in human understandable natural language. The majority of the recent solutions are based on holistic approaches where the scene is described as a whole, potentially losing the important semantic relationship of objects in the scene. We propose Dense-CaptionNet, a region-based deep architecture for fine-grained description of image semantics, which localizes and describes each object/region in the image separately and generates a more detailed description of the scene. The proposed network contains three components which work together to generate a fine-grained description of image semantics. Region descriptions and object relationships are generated by the first module, whereas the second one generates the attributes of objects present in the scene. The textual descriptions obtained as an output of the two modules are concatenated to feed as an input to the sentence generation module, which works on encoder-decoder formulation to generate a grammatically correct but single line, fine-grained description of the whole scene. The proposed Dense-CaptionNet is trained and tested using Visual Genome, MSCOCO, and IAPR TC-12 datasets. The results establish a new state-of-the-art when compared with the existing top performing methodologies, e.g., Up-Down-Captioner, Show, Attend and Tell, Semstyle, and Neural Talk, especially on complex scenes.

Item Type:

Journal Article

Subjects:

P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics
T Technology > TA Engineering (General). Civil engineering (General)

Divisions:

Faculty of Science, Engineering and Medicine > Science > Computer Science

Library of Congress Subject Headings (LCSH):

Computer vision, Image processing -- Computer programs, Image processing -- Digital techniques, Photograph captions, Semantics -- Data processing, Computer graphics, Neural networks (Computer science)

Journal or Publication Title:

Cognitive Computation

Publisher:

Springer

ISSN:

1866-9956

Official Date:

May 2021

Dates:

Date	Event
May 2021	Published
2 March 2020	Available
12 November 2019	Accepted

Volume:

Page Range:

pp. 595-611

DOI:

10.1007/s12559-019-09697-1

Status:

Peer Reviewed

Publication Status:

Published

Reuse Statement (publisher, data, author rights):

This is a post-peer-review, pre-copyedit version of an article published in Cognitive Computation. The final authenticated version is available online at: http://dx.doi.org/10.1007/s12559-019-09697-1

Access rights to Published version:

Restricted or Subscription Access

Date of first compliant deposit:

20 November 2019

Date of first compliant Open Access:

2 March 2021

Related URLs:

Publisher

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

University of Warwick
Publications service & WRAP

Highlight your research

The Library

Dense-CaptionNet : a sentence generation architecture for fine-grained description of image semantics

Abstract

Repository staff actions (login required)

Downloads

University of WarwickPublications service & WRAP

Highlight your research

The Library

Dense-CaptionNet : a sentence generation architecture for fine-grained description of image semantics

Abstract

Repository staff actions (login required)

Downloads

University of Warwick
Publications service & WRAP