I2V-GAN: Unpaired Infrared-to-Visible Video Translation

Shuang Li; Bingfeng Han; Zhenjie Yu; Chi Harold Liu; Kai Chen; Shuigen Wang

doi:10.1145/3474085.3475445

I2V-GAN: Unpaired Infrared-to-Visible Video Translation

Shuang Li, Bingfeng Han, Zhenjie Yu, Chi Harold Liu, Kai Chen, Shuigen Wang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

32 引用（Scopus）

摘要

Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1) adversarial constraint to generate synthetic frames that are similar to the real ones, 2) cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3) similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for infrared-to-visible video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments on IRVI validate that I2V-GAN is superior to the compared state-of-the-art methods in the translation of infrared-to-visible videos with higher fluency and finer semantic details. Moreover, additional experimental results on the flower-to-flower dataset indicate I2V-GAN is also applicable to other video translation tasks. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

源语言	英语
主期刊名	MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
出版商	Association for Computing Machinery, Inc
页	3061-3069
页数	9
ISBN（电子版）	9781450386517
DOI	https://doi.org/10.1145/3474085.3475445
出版状态	已出版 - 17 10月 2021
活动	29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, 中国期限: 20 10月 2021 → 24 10月 2021

出版系列

姓名	MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

会议

会议	29th ACM International Conference on Multimedia, MM 2021
国家/地区	中国
市	Virtual, Online
时期	20/10/21 → 24/10/21

访问文件

10.1145/3474085.3475445

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, S., Han, B., Yu, Z., Liu, C. H., Chen, K., & Wang, S. (2021). I2V-GAN: Unpaired Infrared-to-Visible Video Translation. 在 MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (页码 3061-3069). (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3475445

@inproceedings{4e4dffb06e184bdb90f0a3b7a9dad141,

title = "I2V-GAN: Unpaired Infrared-to-Visible Video Translation",

abstract = "Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1) adversarial constraint to generate synthetic frames that are similar to the real ones, 2) cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3) similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for infrared-to-visible video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments on IRVI validate that I2V-GAN is superior to the compared state-of-the-art methods in the translation of infrared-to-visible videos with higher fluency and finer semantic details. Moreover, additional experimental results on the flower-to-flower dataset indicate I2V-GAN is also applicable to other video translation tasks. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.",

keywords = "GANs, infrared-to-visible, video-to-video translation",

author = "Shuang Li and Bingfeng Han and Zhenjie Yu and Liu, {Chi Harold} and Kai Chen and Shuigen Wang",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 29th ACM International Conference on Multimedia, MM 2021 ; Conference date: 20-10-2021 Through 24-10-2021",

year = "2021",

month = oct,

day = "17",

doi = "10.1145/3474085.3475445",

language = "English",

series = "MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "3061--3069",

booktitle = "MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia",

}

Li, S, Han, B, Yu, Z, Liu, CH, Chen, K & Wang, S 2021, I2V-GAN: Unpaired Infrared-to-Visible Video Translation. 在 MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 页码 3061-3069, 29th ACM International Conference on Multimedia, MM 2021, Virtual, Online, 中国, 20/10/21. https://doi.org/10.1145/3474085.3475445

I2V-GAN: Unpaired Infrared-to-Visible Video Translation. / Li, Shuang; Han, Bingfeng; Yu, Zhenjie 等.
MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2021. 页码 3061-3069 (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - I2V-GAN

T2 - 29th ACM International Conference on Multimedia, MM 2021

AU - Li, Shuang

AU - Han, Bingfeng

AU - Yu, Zhenjie

AU - Liu, Chi Harold

AU - Chen, Kai

AU - Wang, Shuigen

PY - 2021/10/17

Y1 - 2021/10/17

N2 - Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1) adversarial constraint to generate synthetic frames that are similar to the real ones, 2) cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3) similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for infrared-to-visible video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments on IRVI validate that I2V-GAN is superior to the compared state-of-the-art methods in the translation of infrared-to-visible videos with higher fluency and finer semantic details. Moreover, additional experimental results on the flower-to-flower dataset indicate I2V-GAN is also applicable to other video translation tasks. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

AB - Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1) adversarial constraint to generate synthetic frames that are similar to the real ones, 2) cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3) similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for infrared-to-visible video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments on IRVI validate that I2V-GAN is superior to the compared state-of-the-art methods in the translation of infrared-to-visible videos with higher fluency and finer semantic details. Moreover, additional experimental results on the flower-to-flower dataset indicate I2V-GAN is also applicable to other video translation tasks. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

KW - GANs

KW - infrared-to-visible

KW - video-to-video translation

UR - http://www.scopus.com/inward/record.url?scp=85119329196&partnerID=8YFLogxK

U2 - 10.1145/3474085.3475445

DO - 10.1145/3474085.3475445

M3 - Conference contribution

AN - SCOPUS:85119329196

T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

SP - 3061

EP - 3069

BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

Y2 - 20 October 2021 through 24 October 2021

ER -

I2V-GAN: Unpaired Infrared-to-Visible Video Translation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此