Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen*, Ping Jian*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

Original languageEnglish
Title of host publicationProceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages8-14
Number of pages7
ISBN (Electronic)9798350346596
DOIs
Publication statusPublished - 2023
Event2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 - Beihai, China
Duration: 24 Mar 202326 Mar 2023

Publication series

NameProceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

Conference

Conference2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
Country/TerritoryChina
CityBeihai
Period24/03/2326/03/23

Keywords

  • Image dense captioning
  • image retrieval
  • irregular region
  • redundancy
  • visual saliency

Fingerprint

Dive into the research topics of 'Image Dense Captioning of Irregular Regions Based on Visual Saliency'. Together they form a unique fingerprint.

Cite this