Gradient Methods Comparator (SGD, Adam, RMSprop)
Overview
The Gradient Methods Comparator visualizes different optimization algorithms used in machine learning. Compare vanilla gradient descent, SGD, momentum, AdaGrad, RMSprop, and Adam on various loss landscapes. Watch as each algorithm navigates toward the minimum, and observe how adaptive learning rates and momentum affect convergence. Perfect for understanding optimizer choice in neural network training.
Tips
- Gradient descent follows negative gradient direction to minimize loss function
- SGD adds noise from mini-batch sampling, helps escape local minima
- Momentum accumulates gradients to accelerate convergence and reduce oscillation
- AdaGrad adapts learning rate per parameter based on historical gradients
- RMSprop uses exponential moving average of squared gradients for adaptive learning
- Adam combines momentum and RMSprop advantages, most popular in deep learning
- Try different loss landscapes: convex bowl, saddle point, ravine, multiple minima
- Learning rate is critical: too high causes divergence, too low is slow