- On the O(d√T1/4) Convergence Charge of RMSProp and Its Momentum Extension Measured by ℓ1 Norm
Authors: Huan Li, Zhouchen Lin
Summary: Though adaptive gradient strategies have been extensively utilized in deep studying, their convergence charges proved within the literature are all slower than that of SGD, notably with respect to their dependence on the dimension. This paper considers the classical RMSProp and its momentum extension and establishes the convergence fee of 1T∑Tk=1E[∥∇f(xk)∥1]≤O(d√CT1/4) measured by ℓ1 norm with out the bounded gradient assumption, the place d is the dimension of the optimization variable, T is the iteration quantity, and C is a continuing an identical to that appeared within the optimum convergence fee of SGD. Our convergence fee matches the decrease sure with respect to all of the coefficients besides the dimension d. Since ∥x∥2≪∥x∥1≤d−−√∥x∥2 for issues with extraordinarily giant d, our convergence fee could be thought of to be analogous to the 1T∑Tk=1E[∥∇f(xk)∥2]≤O(CT1/4) fee of SGD within the ideally suited case of ∥∇f(x)∥1=Θ(d−−√∥∇f(x)∥2)