Abstract
Video summarization aims to extract salient segments from videos to construct concise and comprehensive synopses. Despite significant advancements, the diversity of video content and the constraint of limited training data pose challenges when applying trained models to new scenarios, often resulting in the domain shift problem. To address this challenge, we propose a domain adaptation framework tailored to video summarization from two aspects: (a) enhancing the generalization ability and (b) improving the adaptive ability of video summarization models. Specifically, we design a simple yet effective regularized feature encoder based on Transformer, where an averaging operation on attention weights serves as a form of regularization. This method mitigates overfitting to domain-specific cues and encourages the learning of more generalizable representations across diverse domains. Furthermore, we introduce a novel discrepancy reduction loss that aligns the distribution of inter-frame feature similarities and inter-frame prediction similarities, combined with a confidence weighting strategy, to adapt the regularized encoder to target domains and mitigate domain shift. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method. Our method achieves state-of-the-art performance under various settings on TVSum and SumMe, and obtains the best results on the transfer setting of Mr.HiSum.
| Original language | English |
|---|---|
| Article number | 112059 |
| Journal | Pattern Recognition |
| Volume | 170 |
| DOIs | |
| Publication status | Published - Feb 2026 |
| Externally published | Yes |
Keywords
- Domain adaptation
- Transformer
- Video summarization