ICSGD-Momentum: SGD Momentum based on Inter-gradient Collision

Weidong Zou, Weipeng Cao*, Yuanqing Xia, Bineng Zhong, Dachuan Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural networks (DNNs) are widely used in fields like computer vision and natural language processing. A key component of DNN training is the optimizer. SGD-Momentum is popular in many DNN methodologies, such as ResNet and DenseNet, due to its simplicity and effectiveness. However, its slow convergence rate limits its use. To overcome this, we introduce inter-gradient collision into SGD-Momentum, inspired by the elastic collision model in physics. This new method, called ICSGD-Momentum, aims to improve convergence. We provide theoretical proof of convergence and establish a regret bound for ICSGD-Momentum. Experiments on benchmarks including function optimization, CIFAR-100, ImageNet, Penn Treebank, COCO, and YCB-Video show that ICSGD-Momentum accelerates training and enhances the generalization performance of DNNs compared to optimizers like SGD-Momentum, Adam, RAdam, Adabound, and AdaBelief.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331527471
DOIs
Publication statusPublished - 2024
Event22nd IEEE International Conference on Industrial Informatics, INDIN 2024 - Beijing, China
Duration: 18 Aug 202420 Aug 2024

Publication series

NameIEEE International Conference on Industrial Informatics (INDIN)
ISSN (Print)1935-4576

Conference

Conference22nd IEEE International Conference on Industrial Informatics, INDIN 2024
Country/TerritoryChina
CityBeijing
Period18/08/2420/08/24

Keywords

  • Adam
  • Deep Neural Networks
  • Optimization Algorithm
  • SGD

Fingerprint

Dive into the research topics of 'ICSGD-Momentum: SGD Momentum based on Inter-gradient Collision'. Together they form a unique fingerprint.

Cite this