# entropy – Is mutual information symmetric even when support size changes?

I have some time series data, $$mathbf{x}_{1:T} = { x_1, dots, x_T }$$ where the observation at time $$t$$, $$X_t$$, is a continuous random variable. Let $$Y_t$$ denote a discrete random variable at time $$t$$ that, conditioned on the previous $$t$$ observations, has support over $$t$$ values. (It is an estimate for each of the previous time points.)

Is this mutual information well-defined?

$$text{MI}(Y_t, X_t) = mathbb{H}(p(color{red}{Y_{t-1}} mid mathbf{x}_{1:t-1})) – mathbb{E}_{X_t}(p(Y_{t} mid mathbf{x}_{1:t-1}, X_t = x_t)).$$

In words, I want to know how much information I gain about $$Y_t$$ by observing $$X_t$$.

1. The maximum entropy of a discrete distribution is a function of the size of support of that distribution. So I am not sure if the left and right terms above are on the same “scale”. I wonder if I should have $$Y_t$$ instead of $$Y_{t-1}$$ (in red above) or if there is another way to handle this (assuming it is a problem).
2. When I approximate $$text{MI}(X_t, Y_t)$$ using some code, I get a slightly different answer (always slightly larger). And sometimes the value for $$text{MI}(Y_t, X_t)$$ is negative, and I know that MI is non-negative.