AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information

Weidong Zou, Yuanqing Xia, Weipeng Cao*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 4
  • Captures
    • Readers: 5
see details

摘要

AdaBelief fully utilizes “belief” to iteratively update the parameters of deep neural networks. However, the reliability of the “belief” is determined by the gradient's prediction accuracy, and the key to this prediction accuracy is the selection of the smoothing parameter β1. AdaBelief also suffers from the overshoot problem, which occurs when the value of parameters exceeds the value of the target and cannot be changed along the gradient direction. In this paper, we propose AdaDerivative to eliminate the overshoot problem of AdaBelief. The key to AdaDerivative is that the “belief” of AdaBelief is replaced by the derivative term's exponential moving average (EMA), which can be constructed as (1−β2)∑i=1tβ2t−i(gi−gi−1)2 based on the past and current gradients. We validate the performance of AdaDerivative on a variety of tasks, including image classification, language modeling, node classification, image generation, and object detection tasks. Extensive experimental results demonstrate that AdaDerivative can achieve state-of-the-art performance.

源语言英语
文章编号105755
期刊Engineering Applications of Artificial Intelligence
119
DOI
出版状态已出版 - 3月 2023

指纹

探究 'AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information' 的科研主题。它们共同构成独一无二的指纹。

引用此

Zou, W., Xia, Y., & Cao, W. (2023). AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information. Engineering Applications of Artificial Intelligence, 119, 文章 105755. https://doi.org/10.1016/j.engappai.2022.105755