Gradient Descent With Adadelta from Scratch
Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function. A limitation of gradient descent is that it uses the same step size (learning rate) for each input variable. AdaGradn and RMSProp are extensions to gradient descent that add a self-adaptive learning rate for each parameter for the objective function. Adadelta can be considered a further extension of gradient descent that builds upon AdaGrad and RMSProp […]
Read more