I’m working on a problem 2.3a in Shalev-Shwartz/Ben-David’s Machine learning textbook, which states:

An axis aligned rectangle classifier in the plane is a classifier that assigns 1 to a point if and only if it is inside a certain rectangle. Formally, given real numbers $a_1leq b_1, a_2leq b_2,$ define the classifier $h_{(a_1, b_1, a_2, b_2)}$ by

$$ h_{(a_1, b_1, a_2, b_2)}(x_1, x_2) = begin{cases}1&textrm{if $a_1leq x_1leq b_1$ and $a_2leq x_2leq b_2$}\ 0&textrm{otherwise}end{cases} $$

The class of all axis aligned rectangles in the plane is defined as $mathcal{H}_mathrm{rec}^2 = {h_{(a_1, b_1, a_2, b_2)}:textrm{$a_1leq b_1$ and $a_2leq b_2$}}$…rely on realizability assumption.

Let $A$ be an algorithm that returns the smallest rectangle enclosing all positive examples in the training set. Show that $A$ is ERM (empirical risk minimizer).

I don’t understand why $A$ would be ERM, when the domain/instance space is infinite. Below is the counterexample I considered to this claim.

For simplicity, consider the one-dimensional case, where the rectangles would be intervals. Sample instance $xsim mathcal{U}_{(0, 1)}$ from a uniform distribution over $(0, 1)$, where the labeling function $f:(0, 1)to{0, 1}$ is defined by

$$

f(x) = begin{cases}

1 &textrm{x = 0 or x = 1}\

0 &textrm{otherwise,}

end{cases}

$$

then the realizability assumption holds as there exists an $h^*$ whose rectangle described by the singleton ${1}$ has an error set ${0}$ of measure $0$. However, in the case of a particular sample

$$

S = {0, 1/4, 1/2, 3/4, 1},

$$

the smallest rectangle $A$ is the interval $(0, 1)$, and the consequent classifier $h_A$ has an empirical error of $L_S(h_A) = 3/5$. In contrast, the classifier $h^*$ has empirical error $L_S(h^*) = 1/5$, a lower error than the classifier described by $A$, even though it doesn’t contain all positive examples. Therefore $A$ does not minimize empirical error.