Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Lin Zhu; Yunlong Zheng; Yijun Zhang; Xiao Wang; Lizhi Wang; Hua Huang

doi:10.1007/978-3-031-73661-2_23

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Lin Zhu^*, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang^*

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

Original language	English
Title of host publication	Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
Editors	Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	411-427
Number of pages	17
ISBN (Print)	9783031736605
DOIs	https://doi.org/10.1007/978-3-031-73661-2_23
Publication status	Published - 2025
Event	18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy Duration: 29 Sept 2024 → 4 Oct 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	15098 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	18th European Conference on Computer Vision, ECCV 2024
Country/Territory	Italy
City	Milan
Period	29/09/24 → 4/10/24

Keywords

Event camera
diffusion model
video reconstruction

Access to Document

10.1007/978-3-031-73661-2_23

Cite this

Zhu, L., Zheng, Y., Zhang, Y., Wang, X., Wang, L., & Huang, H. (2025). Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. In A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (Eds.), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings (pp. 411-427). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15098 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-73661-2_23

Zhu, Lin ; Zheng, Yunlong ; Zhang, Yijun et al. / Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. editor / Aleš Leonardis ; Elisa Ricci ; Stefan Roth ; Olga Russakovsky ; Torsten Sattler ; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. pp. 411-427 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d24226f46f8a4ef9a9bb89e93ca302ee,

title = "Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction",

abstract = "Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.",

keywords = "Event camera, diffusion model, video reconstruction",

author = "Lin Zhu and Yunlong Zheng and Yijun Zhang and Xiao Wang and Lizhi Wang and Hua Huang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024",

year = "2025",

doi = "10.1007/978-3-031-73661-2_23",

language = "English",

isbn = "9783031736605",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "411--427",

editor = "Ale{\v s} Leonardis and Elisa Ricci and Stefan Roth and Olga Russakovsky and Torsten Sattler and G{\"u}l Varol",

booktitle = "Computer Vision – ECCV 2024 - 18th European Conference, Proceedings",

address = "Germany",

}

Zhu, L, Zheng, Y, Zhang, Y, Wang, X, Wang, L & Huang, H 2025, Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. in A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler & G Varol (eds), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15098 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 411-427, 18th European Conference on Computer Vision, ECCV 2024, Milan, Italy, 29/09/24. https://doi.org/10.1007/978-3-031-73661-2_23

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. / Zhu, Lin; Zheng, Yunlong; Zhang, Yijun et al.
Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. ed. / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. p. 411-427 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15098 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

AU - Zhu, Lin

AU - Zheng, Yunlong

AU - Zhang, Yijun

AU - Wang, Xiao

AU - Wang, Lizhi

AU - Huang, Hua

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

AB - Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

KW - Event camera

KW - diffusion model

KW - video reconstruction

UR - http://www.scopus.com/inward/record.url?scp=85210319380&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-73661-2_23

DO - 10.1007/978-3-031-73661-2_23

M3 - Conference contribution

AN - SCOPUS:85210319380

SN - 9783031736605

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 411

EP - 427

BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings

A2 - Leonardis, Aleš

A2 - Ricci, Elisa

A2 - Roth, Stefan

A2 - Russakovsky, Olga

A2 - Sattler, Torsten

A2 - Varol, Gül

PB - Springer Science and Business Media Deutschland GmbH

T2 - 18th European Conference on Computer Vision, ECCV 2024

Y2 - 29 September 2024 through 4 October 2024

ER -

Zhu L, Zheng Y, Zhang Y, Wang X, Wang L, Huang H. Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. In Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, editors, Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. p. 411-427. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-73661-2_23