Most (stochastic) “gradient descent” type algorithms (such as Nesterov-accelerate gradient-descent or ADAM).
Currently, I’ve been looking into Clarke’s generalized derivative since it extends the convex sub-differential methods to non-smooth non-convex functions. However, in this reference (which I’m following), the author says that when we leave the convex setting we need to trade lower semi-continuity of the function being optimized for local-Lipschitzness.
Is this a requirement? More generally, are there any (simultaneous) extension of the subdifferential methods for convex function and the usual derivative for smooth functions, which can be used to optimize discontinuous functions?