I’d like to know if there is a concentration inequality for the sample covariance matrix that don’t assume the knowledge of the true mean.

**Background.**

Given a probability distribution $mu$ on $mathbb R^d$, the covariance matrix of $mu$ is defined as follows:

$$Sigma := mathbb E ((x – bar mu)(x -bar mu)^top) $$

where $x sim mu$ and $bar mu = mathbb E (x)$.

If $X = (x_1, cdots x_m)$ is an i.i.d. sample drawn from $mu$, then we can define two estimators:

begin{align*}

& hat Sigma_1 := frac1m sum_{i=1}^m (x_i – bar mu)(x_i – bar mu)^top, text{ where } bar mu = textbf E_{x sim mu} (x) \

& hat Sigma_2 := frac1{m-1} sum_{i=1}^m (x_i – bar x)(x_i – bar x)^top, text{ where } bar x = frac1m (x_1 + cdots x_m)

end{align*}

They both satisfy $mathbb E_X hat Sigma_1 = mathbb E_X hat Sigma_2 = Sigma$.

The second estimator $hat Sigma_2$ is of interest because $bar mu$ is often not known in practice.

**Question.**

I’m interested in the concentration of $hat Sigma_2$ to $Sigma$ as $m rightarrow infty$. More precisely, given a number $t > 0$, I’d like to know whether there exists a constant $A>0$ and a term $alpha in (0,1)$ that depend on $mu$ and $t$ such that

$$text{Prob}(| Sigma – hat Sigma_2 | ge t) le A cdot alpha^m$$

In the case of the difference $|Sigma – hat Sigma_1|$, such an answer can be obtained using the matrix Bernstein inequality. However, I’m less sure about $|Sigma – hat Sigma_2|$. I have an idea, which is to use the fact that:

$$hat Sigma_1 – hat Sigma_2 = frac1{m(m-1)} sum_{ineq j} (x_i-barmu) (x_j-barmu)^top$$

which follows from:

begin{align*}

hat Sigma_2 =& frac1m sum_i x_i x_i^top – frac1{m(m-1)} sum_{ineq j} x_i x_j^top \

=& frac1m sum_i (x_i-barmu) (x_i-barmu)^top – frac1{m(m-1)} sum_{ineq j} (x_i-barmu) (x_j-barmu)^top \

=& hat Sigma_1 – frac1{m(m-1)} sum_{ineq j} (x_i-barmu) (x_j-barmu)^top

end{align*}

But now I’m not sure how to control the sum of the quantities $(x_i-barmu) (x_j-barmu)^top$, which are *not independent*.

This should be a fairly standard question with a standard answer, but I couldn’t find an answer to this. A similar question’s only answer wasn’t addressing my question; it was addressing the case for $hat Sigma_1$.