calculus and analysis – Numerical Mixed Partial Derivatives: How to Choose Scale?

For illustration purposes, consider the function
$$ f(x, y) = (x – 1)^2 + (y – 1)^2 + sin(25(x+y)) . $$
The mixed partial derivatives at the origin are $f^{(x,x)}(0,0) = f^{(y,y)}(0,0) = 2$ and $f^{(x,y)}(0,0) = 0$ (the $sin$-term does not give any contribution).

I want to determine these derivatives numerically.
That is, let’s imagine I have only access to the function

f((x_)?NumericQ, (y_)?NumericQ) := (x - 1)^2 + (y - 1)^2 + Sin(25*(x + y))

as a numerical black box.
I would first try to solve the problem as follows:

N(D(f(x, y), {x}, {x}) /. x -> 0 /. y -> 0)
(* 2. *)

N(D(f(x, y), {y}, {y}) /. x -> 0 /. y -> 0)
(* 2. *)

N(D(f(x, y), {x}, {y}) /. x -> 0 /. y -> 0)
(* Derivative(1, 1)(f)(0., 0.) *)

In the first two cases, it just works.
However, for the mixed derivative, it is well known that the simple approach fails and one must use nested calls to ND instead. (To keep it short, I will do that the simple way, not using the trick described here to reduce the number of function calls.) Let’s try it:

Needs("NumericalCalculus`");
ND(f(x, 0), {x, 2}, 0)
(* 54.6838 *)

ND(f(0, y), {y, 2}, 0)
(* 54.6838 *)

ND(ND(f(x, y), x, 0), y, 0)
(* -8.40318 *)

Disaster!

The problem, of course, lies in the “scale” used for the differentiation.
We can explore how the results change as a function of the employed scale:

Table({
   ND(f(x, 0), {x, 2}, 0, Scale -> 10^-s),
   ND(f(0, y), {y, 2}, 0, Scale -> 10^-s),
   ND(ND(f(x, y), x, 0, Scale -> 10^-s), y, 0, Scale -> 10^-s)
   }, {s, 0, 6}
)

(*
{{54.6838, 54.6838, -8.40318},
 {2.00013, 2.00013, 7.46312*10^-6},
 {2., 2., -1.89464*10^-7},
 {2.00001, 2.00001, 9.40594*10^-6},
 {2.00063, 2.00063, 0.00106619},
 {2.08502, 2.08502, -0.0788629},
 {-5.68049, -5.68049, -20.9135}}
*)

As we can see, there is only a very narrow window for Scale where ND yields great results.
Yet, using N(D(...) /. ...) in the first example, the results for the non-mixed derivatives were perfect.
My question is: how does Mathematica manage to choose the scale perfectly using that approach?
Is there a way to make Mathematica automatically choose the correct scale also for the mixed derivatives?


Bonus question: could someone explain why the accuracy of the results starts to get worse at a scale of $10^{-3}$ already? I would have expected the accuracy to increase until we get close to the machine precision.
This works, for example:

h = 10^-8;
(f(h, 0) + f(-h, 0) - 2*f(0, 0))/h^2
(* 2 *)