TY - GEN
T1 - ICSGD-Momentum
T2 - 22nd IEEE International Conference on Industrial Informatics, INDIN 2024
AU - Zou, Weidong
AU - Cao, Weipeng
AU - Xia, Yuanqing
AU - Zhong, Bineng
AU - Li, Dachuan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep neural networks (DNNs) are widely used in fields like computer vision and natural language processing. A key component of DNN training is the optimizer. SGD-Momentum is popular in many DNN methodologies, such as ResNet and DenseNet, due to its simplicity and effectiveness. However, its slow convergence rate limits its use. To overcome this, we introduce inter-gradient collision into SGD-Momentum, inspired by the elastic collision model in physics. This new method, called ICSGD-Momentum, aims to improve convergence. We provide theoretical proof of convergence and establish a regret bound for ICSGD-Momentum. Experiments on benchmarks including function optimization, CIFAR-100, ImageNet, Penn Treebank, COCO, and YCB-Video show that ICSGD-Momentum accelerates training and enhances the generalization performance of DNNs compared to optimizers like SGD-Momentum, Adam, RAdam, Adabound, and AdaBelief.
AB - Deep neural networks (DNNs) are widely used in fields like computer vision and natural language processing. A key component of DNN training is the optimizer. SGD-Momentum is popular in many DNN methodologies, such as ResNet and DenseNet, due to its simplicity and effectiveness. However, its slow convergence rate limits its use. To overcome this, we introduce inter-gradient collision into SGD-Momentum, inspired by the elastic collision model in physics. This new method, called ICSGD-Momentum, aims to improve convergence. We provide theoretical proof of convergence and establish a regret bound for ICSGD-Momentum. Experiments on benchmarks including function optimization, CIFAR-100, ImageNet, Penn Treebank, COCO, and YCB-Video show that ICSGD-Momentum accelerates training and enhances the generalization performance of DNNs compared to optimizers like SGD-Momentum, Adam, RAdam, Adabound, and AdaBelief.
KW - Adam
KW - Deep Neural Networks
KW - Optimization Algorithm
KW - SGD
UR - http://www.scopus.com/inward/record.url?scp=85215518902&partnerID=8YFLogxK
U2 - 10.1109/INDIN58382.2024.10774294
DO - 10.1109/INDIN58382.2024.10774294
M3 - Conference contribution
AN - SCOPUS:85215518902
T3 - IEEE International Conference on Industrial Informatics (INDIN)
BT - Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 August 2024 through 20 August 2024
ER -