Gradient Methods Comparator (SGD, Adam, RMSprop)

Overview

The Gradient Methods Comparator visualizes different optimization algorithms used in machine learning. Compare vanilla gradient descent, SGD, momentum, AdaGrad, RMSprop, and Adam on various loss landscapes. Watch as each algorithm navigates toward the minimum, and observe how adaptive learning rates and momentum affect convergence. Perfect for understanding optimizer choice in neural network training.

Open in new tab

Tips

Gradient descent follows negative gradient direction to minimize loss function
SGD adds noise from mini-batch sampling, helps escape local minima
Momentum accumulates gradients to accelerate convergence and reduce oscillation
AdaGrad adapts learning rate per parameter based on historical gradients
RMSprop uses exponential moving average of squared gradients for adaptive learning
Adam combines momentum and RMSprop advantages, most popular in deep learning
Try different loss landscapes: convex bowl, saddle point, ravine, multiple minima
Learning rate is critical: too high causes divergence, too low is slow