linear algebra – Confusion on notation of multivariable differentiation in n-dimensional space for matrices (gradients)

I am learning gradients on my own.

One of the rules given in the textbook I am reading states this (without proof):

$$∇_mathbf{x}mathbf{A}mathbf{x} = mathbf{A}^⊤$$

and then proceeds onwards as if it’s trivial to see. I haven’t seen this before so I need to go prove that it is correct.

I do not know if the above equation is equal to

$$∇f(mathbf{x}) = mathbf{A}^⊤$$


$$f(mathbf{x}) = mathbf{A}mathbf{x}$$

I decided to go ahead with this anyways and see where it takes me. I figure doing this with a simple $mathbb{R}^{2 times 2}$ like $mathbf{A} = begin{bmatrix}a&b\c&dend{bmatrix}$ and seeing where it takes me when $mathbf{x} = begin{bmatrix}i\jend{bmatrix}$.

This means $mathbf{A}mathbf{x}$ is:

$$begin{bmatrix}ai + bj\ci + djend{bmatrix}$$

The gradient would be (for $mathbf{Ax}$ as I’m passing that in for the parameter):

$$∇f(mathbf{Ax}) = begin{bmatrix} frac{partial f(mathbf{Ax})}{partial i} frac{partial f(mathbf{Ax})}{partial j}end{bmatrix}^⊤$$

Which means we want to evaluate $∇_mathbf{x} begin{bmatrix}ai + bj\ci + djend{bmatrix}$. By the definition of a gradient, then we should get $∇f(mathbf{Ax}) = begin{bmatrix} begin{bmatrix}a\cend{bmatrix} \ begin{bmatrix}b\dend{bmatrix} end{bmatrix}$

This however does not look right, or is it? Is that equal to $begin{bmatrix}a&c\b&dend{bmatrix}$?

I wrote it out that way because I suspect that when we ask for $a_{21}$ we are saying “get the 2nd row in the first column, and in a programming fashion, A(2)(1) (or A(1)(0) for most languages) would get $b$ from the result above. This would make sense because I’d get the transpose I was looking for in the primitive example. My linear algebra class only covered vectors, then jumped to matrices and showed us matrix multiplication without covering matrices as if they were a vector of vectors, so if there is a fundamental gap in my knowledge here and what I’ve described above is correct… then we’ve found an issue.

Moving to an n-dimensional proof after figuring out what is wrong (if anything) seems like it’d be straight forward with more rigorous variable and index naming.