Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Lin Zhu; Yunlong Zheng; Yijun Zhang; Xiao Wang; Lizhi Wang; Hua Huang

doi:10.1007/978-3-031-73661-2_23

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Lin Zhu^*, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang^*

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

源语言	英语
主期刊名	Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
编辑	Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
出版商	Springer Science and Business Media Deutschland GmbH
页	411-427
页数	17
ISBN（印刷版）	9783031736605
DOI	https://doi.org/10.1007/978-3-031-73661-2_23
出版状态	已出版 - 2025
活动	18th European Conference on Computer Vision, ECCV 2024 - Milan, 意大利期限: 29 9月 2024 → 4 10月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	15098 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	18th European Conference on Computer Vision, ECCV 2024
国家/地区	意大利
市	Milan
时期	29/09/24 → 4/10/24

访问文件

10.1007/978-3-031-73661-2_23

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhu, L., Zheng, Y., Zhang, Y., Wang, X., Wang, L., & Huang, H. (2025). Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. 在 A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (编辑), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings (页码 411-427). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15098 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-73661-2_23

Zhu, Lin ; Zheng, Yunlong ; Zhang, Yijun 等. / Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. 编辑 / Aleš Leonardis ; Elisa Ricci ; Stefan Roth ; Olga Russakovsky ; Torsten Sattler ; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. 页码 411-427 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d24226f46f8a4ef9a9bb89e93ca302ee,

title = "Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction",

abstract = "Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.",

keywords = "Event camera, diffusion model, video reconstruction",

author = "Lin Zhu and Yunlong Zheng and Yijun Zhang and Xiao Wang and Lizhi Wang and Hua Huang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024",

year = "2025",

doi = "10.1007/978-3-031-73661-2_23",

language = "English",

isbn = "9783031736605",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "411--427",

editor = "Ale{\v s} Leonardis and Elisa Ricci and Stefan Roth and Olga Russakovsky and Torsten Sattler and G{\"u}l Varol",

booktitle = "Computer Vision – ECCV 2024 - 18th European Conference, Proceedings",

address = "Germany",

}

Zhu, L, Zheng, Y, Zhang, Y, Wang, X, Wang, L & Huang, H 2025, Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. 在 A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler & G Varol (编辑), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 15098 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 411-427, 18th European Conference on Computer Vision, ECCV 2024, Milan, 意大利, 29/09/24. https://doi.org/10.1007/978-3-031-73661-2_23

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. / Zhu, Lin; Zheng, Yunlong; Zhang, Yijun 等.
Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. 编辑 / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. 页码 411-427 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15098 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

AU - Zhu, Lin

AU - Zheng, Yunlong

AU - Zhang, Yijun

AU - Wang, Xiao

AU - Wang, Lizhi

AU - Huang, Hua

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

AB - Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

KW - Event camera

KW - diffusion model

KW - video reconstruction

UR - http://www.scopus.com/inward/record.url?scp=85210319380&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-73661-2_23

DO - 10.1007/978-3-031-73661-2_23

M3 - Conference contribution

AN - SCOPUS:85210319380

SN - 9783031736605

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 411

EP - 427

BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings

A2 - Leonardis, Aleš

A2 - Ricci, Elisa

A2 - Roth, Stefan

A2 - Russakovsky, Olga

A2 - Sattler, Torsten

A2 - Varol, Gül

PB - Springer Science and Business Media Deutschland GmbH

T2 - 18th European Conference on Computer Vision, ECCV 2024

Y2 - 29 September 2024 through 4 October 2024

ER -

Zhu L, Zheng Y, Zhang Y, Wang X, Wang L, Huang H. Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction. 在 Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, 编辑, Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. 页码 411-427. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-73661-2_23

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此