There's a big difference between being able to calculate the gradient of an activation function you just invented and apply it to a backpropagation algorithm or analyze it for convergence, and simply understanding what a gradient is and how it is used when training the model.
I find that the understanding and ability to visualize the mathematical concepts is the important part, for everyday practitioners. It is not necessary to be able to derive a gradient in order to understand the differences between sigmoid and hyperbolic tangent. Being able to do the calculation on paper is not a prerequisite for understanding the process.
The rigorous mathematics come into play if you wish to advance the field as a whole, but is not necessary to successfully design and train efficient models.
I find that the understanding and ability to visualize the mathematical concepts is the important part, for everyday practitioners. It is not necessary to be able to derive a gradient in order to understand the differences between sigmoid and hyperbolic tangent. Being able to do the calculation on paper is not a prerequisite for understanding the process.
The rigorous mathematics come into play if you wish to advance the field as a whole, but is not necessary to successfully design and train efficient models.