Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation

Xiao Luo; Ang Li; Baoling Han

doi:10.1007/978-3-031-71013-1_16

Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation

Xiao Luo^*, Ang Li, Baoling Han

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

Abstract

As a new type of teaching tool, web course video breaks the limitations of traditional teaching methods, which has attracted widespread attention. However, due to the limited memory of the filming equipment, the web course video will be compressed and processed, resulting in a lower resolution of the video. In addition, when shooting video, it will be disturbed by factors such as lighting, character movement and blurred PPT projection, resulting in the final captured web course video being impaired in terms of brightness and clarity, which cannot meet the visual needs of users. Therefore, this paper uses video super-resolution reconstruction technology to predict and fill in the missing pixel information in low-resolution video frames, thereby obtaining high-resolution videos and improving user learning efficiency. First, in view of the problems such as occlusion and uneven illumination in online course videos, and the difficulty of the optical flow estimation network in accurately extracting the temporal dependencies in video frames, a Multi-scale Spatiotemporal Information Aggregation network was proposed. The network uses different sizes of 3D convolutions to not only accurately extract the temporal information between video frames at different time intervals, but also obtain the spatial information in the video frames, implicitly completing the alignment between video frames. Secondly, in view of the problem that conventional super-resolution reconstruction methods are difficult to reconstruct text areas in online course videos with high quality, a hybrid residual self-attention reconstruction network is proposed to construct a high-precision spatial self-attention module and a high-precision channel self-attention module. Significantly improves the reconstruction quality of text areas in online course videos. Experimental results show that this algorithm can achieve excellent results in the online course video super-resolution data set.

Original language	English
Title of host publication	Lecture Notes on Data Engineering and Communications Technologies
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	165-174
Number of pages	10
DOIs	https://doi.org/10.1007/978-3-031-71013-1_16
Publication status	Published - 2025

Publication series

Name	Lecture Notes on Data Engineering and Communications Technologies
Volume	218
ISSN (Print)	2367-4512
ISSN (Electronic)	2367-4520

Keywords

3D convolution
attention mechanism
video super-resolution

Access to Document

10.1007/978-3-031-71013-1_16

Cite this

Luo, X., Li, A., & Han, B. (2025). Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation. In Lecture Notes on Data Engineering and Communications Technologies (pp. 165-174). (Lecture Notes on Data Engineering and Communications Technologies; Vol. 218). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-71013-1_16

@inbook{3828dc6f75b94a3c83b023edd82ccbb6,

title = "Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation",

abstract = "As a new type of teaching tool, web course video breaks the limitations of traditional teaching methods, which has attracted widespread attention. However, due to the limited memory of the filming equipment, the web course video will be compressed and processed, resulting in a lower resolution of the video. In addition, when shooting video, it will be disturbed by factors such as lighting, character movement and blurred PPT projection, resulting in the final captured web course video being impaired in terms of brightness and clarity, which cannot meet the visual needs of users. Therefore, this paper uses video super-resolution reconstruction technology to predict and fill in the missing pixel information in low-resolution video frames, thereby obtaining high-resolution videos and improving user learning efficiency. First, in view of the problems such as occlusion and uneven illumination in online course videos, and the difficulty of the optical flow estimation network in accurately extracting the temporal dependencies in video frames, a Multi-scale Spatiotemporal Information Aggregation network was proposed. The network uses different sizes of 3D convolutions to not only accurately extract the temporal information between video frames at different time intervals, but also obtain the spatial information in the video frames, implicitly completing the alignment between video frames. Secondly, in view of the problem that conventional super-resolution reconstruction methods are difficult to reconstruct text areas in online course videos with high quality, a hybrid residual self-attention reconstruction network is proposed to construct a high-precision spatial self-attention module and a high-precision channel self-attention module. Significantly improves the reconstruction quality of text areas in online course videos. Experimental results show that this algorithm can achieve excellent results in the online course video super-resolution data set.",

keywords = "3D convolution, attention mechanism, video super-resolution",

author = "Xiao Luo and Ang Li and Baoling Han",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.",

year = "2025",

doi = "10.1007/978-3-031-71013-1_16",

language = "English",

series = "Lecture Notes on Data Engineering and Communications Technologies",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "165--174",

booktitle = "Lecture Notes on Data Engineering and Communications Technologies",

address = "Germany",

}

Luo, X, Li, A & Han, B 2025, Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation. in Lecture Notes on Data Engineering and Communications Technologies. Lecture Notes on Data Engineering and Communications Technologies, vol. 218, Springer Science and Business Media Deutschland GmbH, pp. 165-174. https://doi.org/10.1007/978-3-031-71013-1_16

Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation. / Luo, Xiao; Li, Ang; Han, Baoling.
Lecture Notes on Data Engineering and Communications Technologies. Springer Science and Business Media Deutschland GmbH, 2025. p. 165-174 (Lecture Notes on Data Engineering and Communications Technologies; Vol. 218).

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

TY - CHAP

T1 - Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation

AU - Luo, Xiao

AU - Li, Ang

AU - Han, Baoling

PY - 2025

Y1 - 2025

N2 - As a new type of teaching tool, web course video breaks the limitations of traditional teaching methods, which has attracted widespread attention. However, due to the limited memory of the filming equipment, the web course video will be compressed and processed, resulting in a lower resolution of the video. In addition, when shooting video, it will be disturbed by factors such as lighting, character movement and blurred PPT projection, resulting in the final captured web course video being impaired in terms of brightness and clarity, which cannot meet the visual needs of users. Therefore, this paper uses video super-resolution reconstruction technology to predict and fill in the missing pixel information in low-resolution video frames, thereby obtaining high-resolution videos and improving user learning efficiency. First, in view of the problems such as occlusion and uneven illumination in online course videos, and the difficulty of the optical flow estimation network in accurately extracting the temporal dependencies in video frames, a Multi-scale Spatiotemporal Information Aggregation network was proposed. The network uses different sizes of 3D convolutions to not only accurately extract the temporal information between video frames at different time intervals, but also obtain the spatial information in the video frames, implicitly completing the alignment between video frames. Secondly, in view of the problem that conventional super-resolution reconstruction methods are difficult to reconstruct text areas in online course videos with high quality, a hybrid residual self-attention reconstruction network is proposed to construct a high-precision spatial self-attention module and a high-precision channel self-attention module. Significantly improves the reconstruction quality of text areas in online course videos. Experimental results show that this algorithm can achieve excellent results in the online course video super-resolution data set.

AB - As a new type of teaching tool, web course video breaks the limitations of traditional teaching methods, which has attracted widespread attention. However, due to the limited memory of the filming equipment, the web course video will be compressed and processed, resulting in a lower resolution of the video. In addition, when shooting video, it will be disturbed by factors such as lighting, character movement and blurred PPT projection, resulting in the final captured web course video being impaired in terms of brightness and clarity, which cannot meet the visual needs of users. Therefore, this paper uses video super-resolution reconstruction technology to predict and fill in the missing pixel information in low-resolution video frames, thereby obtaining high-resolution videos and improving user learning efficiency. First, in view of the problems such as occlusion and uneven illumination in online course videos, and the difficulty of the optical flow estimation network in accurately extracting the temporal dependencies in video frames, a Multi-scale Spatiotemporal Information Aggregation network was proposed. The network uses different sizes of 3D convolutions to not only accurately extract the temporal information between video frames at different time intervals, but also obtain the spatial information in the video frames, implicitly completing the alignment between video frames. Secondly, in view of the problem that conventional super-resolution reconstruction methods are difficult to reconstruct text areas in online course videos with high quality, a hybrid residual self-attention reconstruction network is proposed to construct a high-precision spatial self-attention module and a high-precision channel self-attention module. Significantly improves the reconstruction quality of text areas in online course videos. Experimental results show that this algorithm can achieve excellent results in the online course video super-resolution data set.

KW - 3D convolution

KW - attention mechanism

KW - video super-resolution

UR - http://www.scopus.com/inward/record.url?scp=85205089037&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-71013-1_16

DO - 10.1007/978-3-031-71013-1_16

M3 - Chapter

AN - SCOPUS:85205089037

T3 - Lecture Notes on Data Engineering and Communications Technologies

SP - 165

EP - 174

BT - Lecture Notes on Data Engineering and Communications Technologies

PB - Springer Science and Business Media Deutschland GmbH

ER -

Luo X, Li A, Han B. Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation. In Lecture Notes on Data Engineering and Communications Technologies. Springer Science and Business Media Deutschland GmbH. 2025. p. 165-174. (Lecture Notes on Data Engineering and Communications Technologies). doi: 10.1007/978-3-031-71013-1_16

Research on Video Super-Resolution Technology Based on Multi-scale Spatiotemporal Information Aggregation

Abstract

Publication series

Keywords

Access to Document

Other files and links

Fingerprint

Cite this