I agree, gradient descent implicitly assumes things have a meaningful gradient, which they don’t always. And even if we say anything can be approximated by a continuous function, we’re learning we don’t like approximations in our AI. Some discrete alternative of SGD would be nice.