Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset

Ying Fu; Zichun Wang; Tao Zhang; Jun Zhang

doi:10.1109/TMM.2022.3233247

Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset

Ying Fu, Zichun Wang, Tao Zhang, Jun Zhang^*

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

Recently, supervised deep-learning methods have shown their effectiveness on raw video denoising in low-light. However, existing training datasets have specific drawbacks, e.g., inaccurate noise modeling in synthetic datasets, simple motion created by hand or fixed motion, and limited-quality ground truth caused by the beam splitter in real captured datasets. These defects significantly decline the performance of network when tackling real low-light video sequences, where noise distribution and motion patterns are extremely complex. In this paper, we collect a raw video denoising dataset in low-light with complex motion and high-quality ground truth, overcoming the drawbacks of previous datasets. Specifically, we capture 210 paired videos, each containing short/long exposure pairs of real video frames with dynamic objects and diverse scenes displayed on a high-end monitor. Besides, since spatial self-similarity has been extensively utilized in image tasks, harnessing this property for network design is more crucial for video denoising as temporal redundancy. To effectively exploit the intrinsic temporal-spatial self-similarity of complex motion in real videos, we propose a new Transformer-based network, which can effectively combine the locality of convolution with the long-range modeling ability of 3D temporal-spatial self-attention. Extensive experiments verify the value of our dataset and the effectiveness of our method on various metrics.

Original language	English
Pages (from-to)	8119-8131
Number of pages	13
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2022.3233247
Publication status	Published - 2023

Keywords

Raw video denoising
convolutional neural network
temporal-spatial self-attention
transformer

Access to Document

10.1109/TMM.2022.3233247

Cite this

Fu, Y., Wang, Z., Zhang, T., & Zhang, J. (2023). Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset. IEEE Transactions on Multimedia, 25, 8119-8131. https://doi.org/10.1109/TMM.2022.3233247

@article{8b8e07a5acac4b31840859f1bd7c6d90,

title = "Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset",

abstract = "Recently, supervised deep-learning methods have shown their effectiveness on raw video denoising in low-light. However, existing training datasets have specific drawbacks, e.g., inaccurate noise modeling in synthetic datasets, simple motion created by hand or fixed motion, and limited-quality ground truth caused by the beam splitter in real captured datasets. These defects significantly decline the performance of network when tackling real low-light video sequences, where noise distribution and motion patterns are extremely complex. In this paper, we collect a raw video denoising dataset in low-light with complex motion and high-quality ground truth, overcoming the drawbacks of previous datasets. Specifically, we capture 210 paired videos, each containing short/long exposure pairs of real video frames with dynamic objects and diverse scenes displayed on a high-end monitor. Besides, since spatial self-similarity has been extensively utilized in image tasks, harnessing this property for network design is more crucial for video denoising as temporal redundancy. To effectively exploit the intrinsic temporal-spatial self-similarity of complex motion in real videos, we propose a new Transformer-based network, which can effectively combine the locality of convolution with the long-range modeling ability of 3D temporal-spatial self-attention. Extensive experiments verify the value of our dataset and the effectiveness of our method on various metrics.",

keywords = "Raw video denoising, convolutional neural network, temporal-spatial self-attention, transformer",

author = "Ying Fu and Zichun Wang and Tao Zhang and Jun Zhang",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2023",

doi = "10.1109/TMM.2022.3233247",

language = "English",

volume = "25",

pages = "8119--8131",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset

AU - Fu, Ying

AU - Wang, Zichun

AU - Zhang, Tao

AU - Zhang, Jun

PY - 2023

Y1 - 2023

N2 - Recently, supervised deep-learning methods have shown their effectiveness on raw video denoising in low-light. However, existing training datasets have specific drawbacks, e.g., inaccurate noise modeling in synthetic datasets, simple motion created by hand or fixed motion, and limited-quality ground truth caused by the beam splitter in real captured datasets. These defects significantly decline the performance of network when tackling real low-light video sequences, where noise distribution and motion patterns are extremely complex. In this paper, we collect a raw video denoising dataset in low-light with complex motion and high-quality ground truth, overcoming the drawbacks of previous datasets. Specifically, we capture 210 paired videos, each containing short/long exposure pairs of real video frames with dynamic objects and diverse scenes displayed on a high-end monitor. Besides, since spatial self-similarity has been extensively utilized in image tasks, harnessing this property for network design is more crucial for video denoising as temporal redundancy. To effectively exploit the intrinsic temporal-spatial self-similarity of complex motion in real videos, we propose a new Transformer-based network, which can effectively combine the locality of convolution with the long-range modeling ability of 3D temporal-spatial self-attention. Extensive experiments verify the value of our dataset and the effectiveness of our method on various metrics.

AB - Recently, supervised deep-learning methods have shown their effectiveness on raw video denoising in low-light. However, existing training datasets have specific drawbacks, e.g., inaccurate noise modeling in synthetic datasets, simple motion created by hand or fixed motion, and limited-quality ground truth caused by the beam splitter in real captured datasets. These defects significantly decline the performance of network when tackling real low-light video sequences, where noise distribution and motion patterns are extremely complex. In this paper, we collect a raw video denoising dataset in low-light with complex motion and high-quality ground truth, overcoming the drawbacks of previous datasets. Specifically, we capture 210 paired videos, each containing short/long exposure pairs of real video frames with dynamic objects and diverse scenes displayed on a high-end monitor. Besides, since spatial self-similarity has been extensively utilized in image tasks, harnessing this property for network design is more crucial for video denoising as temporal redundancy. To effectively exploit the intrinsic temporal-spatial self-similarity of complex motion in real videos, we propose a new Transformer-based network, which can effectively combine the locality of convolution with the long-range modeling ability of 3D temporal-spatial self-attention. Extensive experiments verify the value of our dataset and the effectiveness of our method on various metrics.

KW - Raw video denoising

KW - convolutional neural network

KW - temporal-spatial self-attention

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85146248936&partnerID=8YFLogxK

U2 - 10.1109/TMM.2022.3233247

DO - 10.1109/TMM.2022.3233247

M3 - Article

AN - SCOPUS:85146248936

SN - 1520-9210

VL - 25

SP - 8119

EP - 8131

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Low-Light Raw Video Denoising With a High-Quality Realistic Motion Dataset

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this