Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen; Ping Jian

doi:10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen^*, Ping Jian^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

源语言	英语
主期刊名	Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	8-14
页数	7
ISBN（电子版）	9798350346596
DOI	https://doi.org/10.1109/PRMVIA58252.2023.00008
出版状态	已出版 - 2023
活动	2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 - Beihai, 中国期限: 24 3月 2023 → 26 3月 2023

出版系列

姓名	Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

会议

会议	2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023
国家/地区	中国
市	Beihai
时期	24/03/23 → 26/03/23

访问文件

10.1109/PRMVIA58252.2023.00008

其它文件与链接

链接到 Scopus 的出版物

引用此

Wen, X., & Jian, P. (2023). Image Dense Captioning of Irregular Regions Based on Visual Saliency. 在 Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 (页码 8-14). (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PRMVIA58252.2023.00008

Wen, Xiaosheng ; Jian, Ping. / Image Dense Captioning of Irregular Regions Based on Visual Saliency. Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 8-14 (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023).

@inproceedings{5edb63766fff4b88b5bcf8b69ee178fc,

title = "Image Dense Captioning of Irregular Regions Based on Visual Saliency",

abstract = "Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.",

keywords = "Image dense captioning, image retrieval, irregular region, redundancy, visual saliency",

author = "Xiaosheng Wen and Ping Jian",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023 ; Conference date: 24-03-2023 Through 26-03-2023",

year = "2023",

doi = "10.1109/PRMVIA58252.2023.00008",

language = "English",

series = "Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "8--14",

booktitle = "Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023",

address = "United States",

}

Wen, X & Jian, P 2023, Image Dense Captioning of Irregular Regions Based on Visual Saliency. 在 Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023, Institute of Electrical and Electronics Engineers Inc., 页码 8-14, 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023, Beihai, 中国, 24/03/23. https://doi.org/10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency. / Wen, Xiaosheng; Jian, Ping.
Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 8-14 (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Image Dense Captioning of Irregular Regions Based on Visual Saliency

AU - Wen, Xiaosheng

AU - Jian, Ping

PY - 2023

Y1 - 2023

N2 - Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

AB - Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

KW - Image dense captioning

KW - image retrieval

KW - irregular region

KW - redundancy

KW - visual saliency

UR - http://www.scopus.com/inward/record.url?scp=85163846819&partnerID=8YFLogxK

U2 - 10.1109/PRMVIA58252.2023.00008

DO - 10.1109/PRMVIA58252.2023.00008

M3 - Conference contribution

AN - SCOPUS:85163846819

T3 - Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

SP - 8

EP - 14

BT - Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023

Y2 - 24 March 2023 through 26 March 2023

ER -

Wen X, Jian P. Image Dense Captioning of Irregular Regions Based on Visual Saliency. 在 Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 8-14. (Proceedings - 2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms, PRMVIA 2023). doi: 10.1109/PRMVIA58252.2023.00008

Image Dense Captioning of Irregular Regions Based on Visual Saliency

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此