algorithms – Probability that two specific elements are in reservoir sample

Let us study the event that $x_a$ and $x_b$ are both in the reservoir at the end of the algorithm, where $k < a < b$.

In order for this to occur four things must happen:

  1. On the $a$th iteration the random number must be $leq k$.
  2. Between the $a$th and $b$th iteration $x_a$ must not be replaced.
  3. On the $b$th iteration the random number generator must be $leq k$ but also not equal the location of $x_a$. WLOG we say that the random number must be $leq k-1$.
  4. For the remaining iterations neither $x_a$ or $x_b$ must be replaced.

This gives us the equation

$$p = frac{k}{a} cdot prod_{i=a+1}^{b-1}(1 – 1/i)cdot frac{k-1}{b}cdot prod_{i=b+1}^n(1-2/i).$$

Left as an exercise to the reader, this solves to
$$p = frac{k}{a} cdot frac{a}{b-1}cdot frac{k-1}{b}cdot frac{(b-1)b}{(n-1)n},$$
$$p = frac{(k-1)k}{(n-1)n}.$$

Thus the probability that $x_a, x_b$ are both in the sample is independent of $a, b$ when $k < a < b$. The case where $k geq a$ is once again left as an exercise.