# pr.probability – How to estimate a mapping between two empirical cumulative distribution functions?

I have a question that is somewhat a mix of probability/statistics and operator theory. The goal is to learn/estimate the mapping between two empirical cumulative distribution functions (ECDFs), so that in future we can use this mapping to make estimates of other ECDFs.

Suppose we measure the temperature in room $$R_1$$ every day at time $$t_0$$, and we do this for $$n=10^6$$ days. Call these measurements which we assume are iid and which have common CDF $$F^{t_0}$$, $$X_1^{t_0},X_2^{t_0},dots,X_n^{t_0}$$. Using these measurements, we can construct the empirical distribution function $$hat F_n^{t_0}$$ associated with the room temperature at time $$t_0$$.

Now suppose instead of just taking a measurement every day at time $$t_0$$, we do the following. Every day we: (i) Take measurement at $$t_0$$, (ii) turn on air conditioning, (iii) let one hour pass (iv), take a new measurement at time $$t_1$$. Call these new measurements which again we assume are iid with respect to each other (but of course not with respect to the $$t_0$$ measurements) and which have common CDF $$F^{t_1}$$, $$X_1^{t_1},X_2^{t_1},dots,X_n^{t_1}$$. Again we can construct the empirical distribution function $$hat F_n^{t_1}$$ associated with the room temperature at time $$t_1$$.

Now, based on the measurements, we have an idea of how the ECDF $$hat F_n^{t_0}$$ gets mapped to the ECDF $$hat F_n^{t_1}$$ by the act of applying air conditioning for one hour between times $$t_0$$ and $$t_1$$. Let $$T$$ be the operator that maps $$F_n^{t_0}$$ to $$F_n^{t_1}$$:
$$T(hat F_n^{t_0})(x) = hat F_n^{t_1}(x).$$
$$T$$ is an $$ntimes n$$ matrix since the ECDFs can be considered $$n$$-dimensional vectors.

Now suppose we considered a new room $$R_2$$ that is very similar to the room $$R_1$$ we considered above. Let’s take measurements in room $$R_2$$ every day at time $$t_0$$ for $$n=10^6$$ days. Call these measurements which we assume are iid and which have common CDF $$G^{t_0}$$, $$Y_1^{t_0},Y_2^{t_0},dots,Y_n^{t_0}$$. Let $$hat G_n^{t_0}$$ be the ECDF associated with these measurements.

Now, since we have an idea of $$hat F_n^{t_0}$$ maps to $$hat F_n^{t_1}$$ in room $$R_1$$, can we use this information to estimate/predict the ECDF $$tilde G_n^{t_1}$$ that we would expect to find if we took measurements at time $$t_1$$ in room $$R_2$$ after air conditioning has been applied at time $$t_0$$ for one hour?

If we knew $$T$$, we could make the estimate
$$tilde G_n^{t_1} = T(hat G_n^{t_0})(x).$$
But how can we find a reasonable representation of the matrix $$T$$ since there are an infinite number of possibilities for $$T$$. All we know is that it maps $$hat F_n^{t_0}$$ to $$hat F_n^{t_1}$$ in room $$R_1$$.

One constraint we could impose is some form of continuity, e.g., if $$hat G_n^{t_0}$$ is close to $$hat F_n^{t_0}$$ then $$tilde G_n^{t_1} = T(hat G_n^{t_0})$$ should be close to $$tilde F_n^{t_1}$$. But even that leaves an infinite number of representations for the operator $$T$$. So is there any reasonable way to estimate the effect of applying air conditioning for one hour starting at time $$t_0$$ to the room $$R_2$$, that is, is there a reasonable way to estimate $$T$$?

If it is not completely impossible, does bootstrapping seem like a way we could estimate $$T$$? I was thinking that in the ‘bootstrap world’, we could create many bootstrap ECDFs $$F_n^{t_0,(i)}$$ and $$F_n^{t_1,(i)}$$. The idea would be that this allows us to learn how the mapping between ECDFs occurs in lots of ‘bootstrap rooms’ rather than just the single room $$R_1$$ that we have in the real world. Could we then use these bootstrap ECDFs to estimate $$T^*$$ in the bootstrap world, which ultimately could serve as our estimate of $$T$$ in the real world? It kind of seems like the same problem exists in the bootstrap world, on each bootstrap sample there are infinite number of ways $$T^*$$ could map $$F_n^{t_0,(i)}$$ to $$F_n^{t_1,(i)}$$

I also thought of optimal mass transport, but I only really know the general idea when it comes to this, I’ve no idea if it could could be of use in my problem.

Posted on Categories Articles