It is known that for every $varepsilon>0$ there is an appropriate neural network architecture, such that one can approximate any function $f:(0,1)^nto(0,1)^m$ by the neural network output $hat{f}(bullet;w):(0,1)^nto(0,1)^m$ as:

$$

|f(x)-hat{f}(x;w)|leq varepsilon, forall xin(0,1)^n quad quad quad quad quad (1)

$$

**My question is** if its possible to derive a similar result, but with a conclusion of the following form: that for every $varepsilon’>0$ there exists a suitable architecture such that the estimation $hat{f}(x;w)$ (different from the one in (1)) complies

$$

|f(x)-hat{f}(x;w)|leq varepsilon’ |f(x)|, forall xin(0,1)^n quad quad(2)

$$

This is, the approximation $hat{f}(x;w)$ gets better if the true output $f(x)$ is small.

I am not trying to show that (1) implies (2), but that there is $hat{f}$ complying (2) by itself. This is a stronger result, since (2) can imply (1) for appropriate $varepsilon,varepsilon’$ . If you think (2) is too strong to be even true, I think showing

$$

|f(x)-hat{f}(x;w)|leq varepsilon”+varepsilon’|f(x)|, forall xin(0,1)^n

$$

for some $varepsilon”+varepsilon’|(1,dots,1)|$ smaller (or equal) than the bound $varepsilon$ in (1) would be interesting too.

I’m not looking for complete proofs, but just some discussion and suggestions:

-Do you think a result of this form should be able to be proved?

-Do you have any suggestion on what to look for in order to proceed in the proof?

And one more. Do you think this question fits here, or should be posted at stats.stackexchange.com or math.stackexchange.com instead?