optimization – What makes Deep RL “fundamentally/mathematically” advantageous?

I’m not sure this will answer your question “mathematically”, but it will definitely give you some idea to why it is “advantageous”.

Usual RL requires to keep in memory a state-value vector. When the number of possible states is large, or even infinite – this is not practical to keep in memory. In addition, the usual RL requirements for convergence require to visit at least a certain number of times each unique state, and having such a large number of states would mean we won’t visit most of them, most of the time (and hence you won’t really converge. Unless you wait a few thousand years I guess…)

Deep RL tackles this problem by introducing a continuous function that is supposed to predict the “value” of a state given some representation of the state. This means we don’t have to store a huge array mapping states to their values, and as such it means its far more practical.

Its also capable of “generalizing” the value of a certain state to other similar states, instead of computing it independently for them.

I hope this gave you some intuition on why Deep RL is so popular and powerful 🙂