I’m looking at a couple small articles about whitening transformations:

**Background**

https://theclevermachine.wordpress.com/2013/03/30/the-statistical-whitening-transform/

and

https://andrewcharlesjones.github.io/posts/2020/05/whitening/

In both articles, there comes a step where given a centered data matrix $X$

we compute its covariance

$$Sigma = XX^T$$

and come up with a matrix $W$ that satisfies

$$WW^T = Sigma^{-1}$$

The idea now is that if we transform our data $X$ into $Y = WX$ we can show that

$$cov(Y) = WX (WX)^T$$

$$= WXX^TW^T$$

$$= WSigma W^T$$

**Issue**

All of this seems reasonable so far, but both authors in the referenced articles make the following leap:

They claim you can reduce the above to $I$. In this article,

https://theclevermachine.wordpress.com/2013/03/30/the-statistical-whitening-transform/,

some of the work is sort of shown:

It is stated that $WSigma W^T$ = $WW^TSigma$ which would then obviously reduce to $I$.

Why is it OK to swap the order of $W^T$ and $Sigma$ in the above expression?