entropy – Is mutual information symmetric even when support size changes?

I have some time series data, $mathbf{x}_{1:T} = { x_1, dots, x_T }$ where the observation at time $t$, $X_t$, is a continuous random variable. Let $Y_t$ denote a discrete random variable at time $t$ that, conditioned on the previous $t$ observations, has support over $t$ values. (It is an estimate for each of the previous time points.)

Is this mutual information well-defined?

$$
text{MI}(Y_t, X_t) = mathbb{H}(p(color{red}{Y_{t-1}} mid mathbf{x}_{1:t-1})) – mathbb{E}_{X_t}(p(Y_{t} mid mathbf{x}_{1:t-1}, X_t = x_t)).
$$

In words, I want to know how much information I gain about $Y_t$ by observing $X_t$.

I ask for two reasons:

  1. The maximum entropy of a discrete distribution is a function of the size of support of that distribution. So I am not sure if the left and right terms above are on the same “scale”. I wonder if I should have $Y_t$ instead of $Y_{t-1}$ (in red above) or if there is another way to handle this (assuming it is a problem).

  2. When I approximate $text{MI}(X_t, Y_t)$ using some code, I get a slightly different answer (always slightly larger). And sometimes the value for $text{MI}(Y_t, X_t)$ is negative, and I know that MI is non-negative.