In section **5 VC-Dimension of the Set of Functions** of the paper **Principles of Risk Minimization for Learning Theory** by V. Vapnik, the author says the following:

The theory of uniform convergence of empirical risk to actual risk developed in the 70’s and 80’s, includes a description of necessary and sufficient conditions as well as bounds for the rate of convergence (Vapnik, 1982). These bounds, which are independent of the distribution function $P(x, y)$, are based on a quantitative measure of the capacity of the set of functions implemented by the learning machine: the VC-dimension of the set.

For simplicity, these bounds will be discussed here only for the case of binary pattern recognition, for which $y in {0, 1}$ and $f(x, w), w in W$ is the class of indicator functions. The loss function takes only two values $L(y, f(x, w)) = 0$ if $y = f(x, w)$ and $L(y, f(x, w)) = 1$ otherwise. In this case, the risk functional (2) is the probability of error, denoted by $p(w)$). The empirical risk functional (3), denoted by $v(w)$, is the frequency of error in the training set.

It is this part that I am confused about:

For simplicity, these bounds will be discussed here only for the case of binary pattern recognition, for which $y in {0, 1}$ and $f(x, w)$, $w in W$ is the class of indicator functions.

What exactly is meant by “class of indicator functions”, and which one of $y in {0, 1}$ or $f(x, w)$ is this “class of indicator functions”? $f$ is the only function I see here, but I’m not sure how it’s a “class”. Could this be in reference to “equivalence classes”?