TY - GEN
T1 - ROMA
T2 - 30th ACM International Conference on Multimedia, MM 2022
AU - Yu, Zhenjie
AU - Chen, Kai
AU - Li, Shuang
AU - Han, Bingfeng
AU - Liu, Chi Harold
AU - Wang, Shuigen
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle, which hinders its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity, which is $20$ times larger than the recently released infrared-related dataset IRVI. Codes and datasets are available https://github.com/BIT-DA/ROMA here.
AB - Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle, which hinders its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity, which is $20$ times larger than the recently released infrared-related dataset IRVI. Codes and datasets are available https://github.com/BIT-DA/ROMA here.
KW - daytime visible
KW - nighttime infrared
KW - video-to-video translation
UR - http://www.scopus.com/inward/record.url?scp=85150977660&partnerID=8YFLogxK
U2 - 10.1145/3503161.3548221
DO - 10.1145/3503161.3548221
M3 - Conference contribution
AN - SCOPUS:85150977660
T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
SP - 5294
EP - 5302
BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 10 October 2022 through 14 October 2022
ER -