Abstract
Salient object detection (SOD) focuses on mimicking the attention mechanism in human vision system. Due to the limited information provided from the traditional image data, the image-based SOD advances are very challenging in some complex scenes. Inspired by the emerging event cameras that provide asynchronous measurements of local temporal contrast over a large dynamic range, we propose a new idea to extract more effective information from the combination of event flow and RGB images. In this paper, we construct an end-to-end joint network for salient object detection (ERSOD-Net), which simultaneously supervise the RGB image and the event data within the corresponding image exposure time. To fully exploit temporal information of event data, Long Short-Term Memory module is utilized to effectively process event and learn salient object event surfaces. Moreover, multi-level feature interaction is designed to fuse two complementary branches of image and event and to predict significance mapping. Finally, to demonstrate the effectiveness of our model, a real Event-RGB SOD dataset (ERSOD) is built by DAVIS camera. Experiments on both benchmark and ERSOD datasets show that the proposed event guided network greatly improves the SOD performance in different evaluation metrics. The code and datasets will be released later at https://github.com/jxr326/ERSOD-Net.
| Original language | English |
|---|---|
| Pages (from-to) | 317-324 |
| Number of pages | 8 |
| Journal | Pattern Recognition Letters |
| Volume | 151 |
| DOIs | |
| Publication status | Published - Nov 2021 |
| Externally published | Yes |
Keywords
- Event-based vision
- Image saliency
- Long short-term memory
- Salient object detection
- Visual attention
Fingerprint
Dive into the research topics of 'Learning event guided network for salient object detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver