AdaSwitch: Adapting switch from Adam to SGDM by exponential function

  • Weidong Zou
  • , Yuanqing Xia
  • , Bineng Zhong
  • , Weipeng Cao*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Optimizers play a critical role in the training of deep neural networks (DNNs). Adam is known for its fast convergence, while Stochastic Gradient Descent with Momentum (SGDM) is valued for its strong generalization capability. However, both exhibit limitations: SGDM often suffers from slow convergence initially, whereas Adam tends to generalize poorly in later stages. To address these issues, we propose AdaSwitch, an optimizer that combines the strengths of both methods, achieving rapid convergence early in training and robust generalization later on. AdaSwitch employs a linear combination based on an exponential function to smoothly transition from Adam to SGDM by adjusting the DNN parameters. We also provide a theoretical convergence guarantee for non-convex settings. The core idea is to express the network parameters θt as θt3tθtAdam+(1−β3ttSGDM, where (β3∈(0,1)) is the base of the adaptive exponential function. Extensive experiments on various architectures and tasks demonstrate that AdaSwitch outperforms existing methods in image classification, image generation, node classification, and few-shot visual classification, delivering both fast convergence and strong generalization.

Original languageEnglish
Article number114459
JournalApplied Soft Computing
Volume189
DOIs
Publication statusPublished - Mar 2026
Externally publishedYes

Keywords

  • Deep learning
  • Fast convergence
  • Optimizers
  • Robust generalization

Fingerprint

Dive into the research topics of 'AdaSwitch: Adapting switch from Adam to SGDM by exponential function'. Together they form a unique fingerprint.

Cite this