Scene captioning with deep fusion of images and point clouds

Qiang Yu*, Chunxia Zhang, Lubin Weng, Shiming Xiang, Chunhong Pan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 1
  • Captures
    • Readers: 5
see details

Abstract

Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.

Original languageEnglish
Pages (from-to)9-15
Number of pages7
JournalPattern Recognition Letters
Volume158
DOIs
Publication statusPublished - Jun 2022

Keywords

  • Deep fusion
  • Point cloud
  • Scene captioning

Fingerprint

Dive into the research topics of 'Scene captioning with deep fusion of images and point clouds'. Together they form a unique fingerprint.

Cite this

Yu, Q., Zhang, C., Weng, L., Xiang, S., & Pan, C. (2022). Scene captioning with deep fusion of images and point clouds. Pattern Recognition Letters, 158, 9-15. https://doi.org/10.1016/j.patrec.2022.04.017