Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen; Ping Jian

doi:10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen^*, Ping Jian^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

Original language	English
Title of host publication	Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	8-14
Number of pages	7
ISBN (Electronic)	9798350346596
DOIs	https://doi.org/10.1109/PRMVIA58252.2023.00008
Publication status	Published - 2023
Event	2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 - Beihai, China Duration: 24 Mar 2023 → 26 Mar 2023

Publication series

Name	Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

Conference

Conference	2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
Country/Territory	China
City	Beihai
Period	24/03/23 → 26/03/23

Keywords

Image dense captioning
image retrieval
irregular region
redundancy
visual saliency

Access to Document

10.1109/PRMVIA58252.2023.00008

Cite this

Wen, X., & Jian, P. (2023). Image Dense Captioning of Irregular Regions Based on Visual Saliency. In Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 (pp. 8-14). (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PRMVIA58252.2023.00008

Wen, Xiaosheng ; Jian, Ping. / Image Dense Captioning of Irregular Regions Based on Visual Saliency. Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 8-14 (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023).

@inproceedings{5edb63766fff4b88b5bcf8b69ee178fc,

title = "Image Dense Captioning of Irregular Regions Based on Visual Saliency",

abstract = "Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.",

keywords = "Image dense captioning, image retrieval, irregular region, redundancy, visual saliency",

author = "Xiaosheng Wen and Ping Jian",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 ; Conference date: 24-03-2023 Through 26-03-2023",

year = "2023",

doi = "10.1109/PRMVIA58252.2023.00008",

language = "English",

series = "Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "8--14",

booktitle = "Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023",

address = "United States",

}

Wen, X & Jian, P 2023, Image Dense Captioning of Irregular Regions Based on Visual Saliency. in Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023, Institute of Electrical and Electronics Engineers Inc., pp. 8-14, 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023, Beihai, China, 24/03/23. https://doi.org/10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency. / Wen, Xiaosheng; Jian, Ping.
Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 8-14 (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Image Dense Captioning of Irregular Regions Based on Visual Saliency

AU - Wen, Xiaosheng

AU - Jian, Ping

PY - 2023

Y1 - 2023

N2 - Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

AB - Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

KW - Image dense captioning

KW - image retrieval

KW - irregular region

KW - redundancy

KW - visual saliency

UR - http://www.scopus.com/inward/record.url?scp=85163846819&partnerID=8YFLogxK

U2 - 10.1109/PRMVIA58252.2023.00008

DO - 10.1109/PRMVIA58252.2023.00008

M3 - Conference contribution

AN - SCOPUS:85163846819

T3 - Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

SP - 8

EP - 14

BT - Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

Y2 - 24 March 2023 through 26 March 2023

ER -

Wen X, Jian P. Image Dense Captioning of Irregular Regions Based on Visual Saliency. In Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 8-14. (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023). doi: 10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this