Visualize how learning rate affects gradient descent convergence
The learning rate (α) controls how big a step gradient descent takes in each iteration. It's one of the most important hyperparameters in machine learning optimization.
Too Small: Convergence is very slow. Many iterations needed. Safe but inefficient.
Optimal: Fast convergence with stable path. Reaches minimum efficiently.
Too Large: Overshoots minimum, oscillates, or diverges completely.
Tips: Watch how the optimization path changes with different learning rates. Notice how small rates take tiny steps while large rates jump around or explode. The loss curve shows convergence speed and stability.