Compare sigmoid, ReLU, tanh, and other activation functions side by side
Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Without activation functions, even deep networks would be equivalent to simple linear models.
Sigmoid & Tanh: Classic functions with bounded outputs. Suffer from vanishing gradients in deep networks.
ReLU: Most popular for hidden layers. Fast and effective but can have "dying neuron" problem.
Leaky ReLU & ELU: Variants that address ReLU's limitations by allowing small negative values.
Swish & GELU: Modern functions that are smooth and often perform better than ReLU.
Tips: Select multiple functions to compare their shapes and derivatives. Notice how ReLU and its variants have constant gradients (no vanishing), while sigmoid and tanh derivatives approach zero for large |x|.