Topic-Based Image Caption Generation

Sandeep Kumar Dash; Shantanu Acharya; Partha Pakray; Ranjita Das; Alexander Gelbukh

doi:10.1007/s13369-019-04262-2

Topic-Based Image Caption Generation

Sandeep Kumar Dash, Shantanu Acharya, Partha Pakray, Ranjita Das, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder–decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.

Original language	English
Pages (from-to)	3025-3034
Number of pages	10
Journal	Arabian Journal for Science and Engineering
Volume	45
Issue number	4
DOIs	https://doi.org/10.1007/s13369-019-04262-2
State	Published - 1 Apr 2020

Keywords

Deep learning
Image caption generation
Topic modelling

Access to Document

10.1007/s13369-019-04262-2

Cite this

@article{40c2bd6b2a2049d9ba3fa2223fbe0f9b,

title = "Topic-Based Image Caption Generation",

abstract = "Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder–decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.",

keywords = "Deep learning, Image caption generation, Topic modelling",

author = "Dash, {Sandeep Kumar} and Shantanu Acharya and Partha Pakray and Ranjita Das and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2019, King Fahd University of Petroleum & Minerals.",

year = "2020",

month = apr,

day = "1",

doi = "10.1007/s13369-019-04262-2",

language = "Ingl{\'e}s",

volume = "45",

pages = "3025--3034",

journal = "Arabian Journal for Science and Engineering",

issn = "2193-567X",

number = "4",

}

TY - JOUR

T1 - Topic-Based Image Caption Generation

AU - Dash, Sandeep Kumar

AU - Acharya, Shantanu

AU - Pakray, Partha

AU - Das, Ranjita

AU - Gelbukh, Alexander

PY - 2020/4/1

Y1 - 2020/4/1

N2 - Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder–decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.

AB - Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder–decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.

KW - Deep learning

KW - Image caption generation

KW - Topic modelling

UR - http://www.scopus.com/inward/record.url?scp=85075899332&partnerID=8YFLogxK

U2 - 10.1007/s13369-019-04262-2

DO - 10.1007/s13369-019-04262-2

M3 - Artículo

AN - SCOPUS:85075899332

SN - 2193-567X

VL - 45

SP - 3025

EP - 3034

JO - Arabian Journal for Science and Engineering

JF - Arabian Journal for Science and Engineering

IS - 4

ER -

Topic-Based Image Caption Generation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this