I have been reading up on the Count Sketch algorithm, and I stumpled upon the Count Sketh algorithm explained in section 5 of https://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf. Then, I tried to solve some of the exercises. However, exercise 5-3 is causing me some problems.

If we let $hat{f}_a$ denote the estimated frequency associated with a key $a$ and $f_a$ be actual frequency associated with key $a$, then it can be show that $E(hat{f}_a)=f_a$. Furthermore, for $jin(n)$, we let $mathbf{f}_{-j}$ be the $(n-1)$ dimensional vector obtained by dropping the $i$th entry of frequency vector $mathbf{f}.$ It can then be shown that

$$

text{Var}(hat{f}_a)=frac{||mathbf{f}_{-a}||_2^2}{k}

$$

when we use the hash functions $h : (n) rightarrow (k)$. Thus, using Chebyshev’s inequality, it can be shown that

$$

text{Pr}(|hat{f}_a – f_a| geq epsilon ||mathbf{f}_{-a}||_2) leq frac{1}{3}

$$

for $k=frac{3}{epsilon^2}$.

In exercise 5-3, $textbf{f}_{-a}^{text{res}(ell)}$ denotes the $(n-1)$-dimensional vector obtained by setting the $ell$ largest (by absolute value) entries to zero in $mathbf{f}_{-a}$. The exercise then essentially asks the reader to show that

$$

text{Pr}(|hat{f}_a – f_a| geq epsilon ||mathbf{f}_{-a}^{text{res}(ell)}||_2) leq frac{1}{3}

$$

where $k=frac{6}{epsilon^2}$ and $ell=1/epsilon^2$. However, I don’t know how to show that the inequality above holds. I have tried using Chebyshev’s inequality, but I still can’t seem to make it work. Any help would be appreciated.