The following seminar will take place at the Department of Mathematics:
Tuesday 18 July 2023, 14:00. Aula Magna (Dipartimento di Matematica)
https://events.dm.unipi.it/e/201
On the influence of stochastic rounding bias in implementing gradient descent with applications in low-precision training
Lu Xia (Eindhoven University of Technology)
In the context of low-precision computation for the training of neural networks with the
gradient descent method (GD), the occurrence of deterministic rounding errors often leads
to stagnation or adversely affects the convergence of the optimizers. The employ-
ment of unbiased stochastic rounding (SR) may partially capture gradient updates that
are lower than the minimum rounding precision, with a certain probability. We
provide a theoretical elucidation for the stagnation observed in GD when training neural
networks with low-precision computation. We analyze the impact of floating-point round-
off errors on the convergence behavior of GD with a particular focus on convex problems.
Two biased stochastic rounding methods, signed-SR𝜀 and SR𝜀, are proposed, which have
been demonstrated to eliminate the stagnation of GD and to result in significantly faster
convergence than SR in low-precision floating-point computation.
We validate our theoretical analysis by training a binary logistic regression model on
the Cifar10 database and a 4-layer fully-connected neural network model on the MNIST
database, utilizing a 16-bit floating-point representation and various rounding techniques.
The experiments demonstrate that signed-SR𝜀 and SR𝜀 may achieve higher classification
accuracy than rounding to the nearest (RN) and SR, with the same number of training
epochs. It is shown that a faster convergence may be obtained by the new rounding
methods with 16-bit floating-point representation than by RN with 32-bit floating-point
representation.