摘要
Event-based video reconstruction aims to generate images from asynchronous event streams, which record the intensity changes exceeding specific contrast thresholds. However, the contrast thresholds are varied among pixels with manufacturing imperfections and circumstancing interference, which causes undesirable events. It may cause the existing works to output blurry frames with unpleasing artifacts. To address this, we propose a novel two-stage framework to reconstruct images with learnable parameter representations. The learnable representation of the contrast threshold is extracted with a transformer network from corresponding asynchronous events in the first stage. Then a UNet architecture is utilized in the second stage to fuse the representations with the event encoding features to refine the decoding features in spatiotemporal dimensions. The representation learned from asynchronous events can adapt to the variety of contrast thresholds when processing event data in diverse scenes, motivating the proposed framework to generate high-quality frames. Quantitative and qualitative experimental results on the four public datasets show that our approach achieves better performance.
源语言 | 英语 |
---|---|
页(从-至) | 1950-1954 |
页数 | 5 |
期刊 | IEEE Signal Processing Letters |
卷 | 31 |
DOI | |
出版状态 | 已出版 - 2024 |