## Probability Distributions – It is argued that two random variables are not independent and that their covariance is zero

Consider two independent random variables $$X sim mathrm {Unif} (- 1,1)$$ and $$Z sim mathrm {Unif} (0, frac {1} {10})$$, To let $$Y = X ^ 2 + Z$$,

The first question was to show that the conditional density of $$Y$$ given $$X$$ is $$mathrm {uniform} (x ^ 2, x ^ 2 + frac {1} {10})$$ what I did.
So
$$f_ {Y | X} (y | x) = 10I (x ^ 2 leq y leq x ^ 2 + frac {1} {10}).$$

The next question was the calculation of the joint density of $$X$$ and $$Y$$ and I calculated it that way
$$f_ {X, Y} (x, y) = 5I (-1

Now I have to argue that $$X$$ and $$Y$$ are not independent. But I can not see how that helps to prove that it is not independent. Even more, I have to show that the covariance of $$X$$ and $$Y$$ is zero. I have found a way to do it with repeated expectation, but we did not learn it in the class, so I think there is another way.

## Visualization of a probability space in tungsten with two events?

Say for two events $$A$$ and $$B$$Suppose we have the data: $$P (A)$$. $$P (B)$$. $$P (A | B)$$ and $$P (B | A)$$Is there any way to generate something like a "Probability VENN diagram" from this information? The following online information is meant as it were:

If there are other ways to visualize this in tungsten, those approaches would be of interest to me as well.

## Optimization – Maximizing the logarithmic limit probability

Given the random vector $$y = A ^ {1/2} epsilon$$ With $$epsilon = N (0, I)$$, from where $$A$$ If a positive definitive covariance matrix, I want to maximize the log threshold probability
$$log p (y | X) = – frac {1} {2} y ^ T Sigma ^ {- 1} y – frac {1} {2} log | Sigma | – frac {n} {2} log2 pi$$

write to $$Sigma$$, Set the gradient $$0$$ I get:

$$Sigma = yy ^ {T} = A ^ {1/2} epsilon epsilon ^ {T} A ^ {1 / 2T}$$

which clearly is not clearly positive because it is the outer product of a vector $$y$$ So it has rank 1. So I should clearly impose the restriction $$det ( Sigma)> 0$$ and solve the limited optimum with the Lagrange. I know that $$E ( epsilon epsilon ^ {T}) = 1$$, so on average for a generic observation of $$yy ^ {T}$$I will have that $$epsilon epsilon ^ {T} = I$$ and therefore stop $$Sigma = A ^ {1/2} A ^ {1 / 2T}$$ maximizes the log probability of a generic observation $$det ( Sigma)> 0$$, Is there an elegant way of saying that and avoiding maximization with the Lagrange and the restriction? $$det ( Sigma)> 0$$?

## Research – Confidence intervals, confidence levels and probability of simple tests

It seems like a simple problem, but I can not figure it out

Let's say I'd like to know if it makes sense at some point to implement a new feature. Whether we need to focus on the function or not. Suppose there is no way to query the users or whatever.
Its function will be simple, such as "webcam for e-commerce for users who pay a premium account".

Specifically, I have 1500 premium users. I can say, "Feature is used when at least 75% of customers use it". Great! We want to do a fake-door test where we just implement the button for the webcam. If the user clicks on it, we'll show them that we're implementing that feature, staying with us or whatever (I know, fake doors are not) the best method, but that's not the point). I will "test" it for 14 days. In 14 days, 350 customers will come to my website and see this feature. 265 customers click on the button.

What can I say about this feature? Apparently I can say, "Yes, we have to implement it because 75% of users use this feature" (75% of 350 is 262.5 < 265) => H0 (at least 75% use this feature) seems to be ok. But it is not true at all. Because there can be HUGE errors (I tested ONLY 23% of the customers).

What I'm trying to achieve is:
I want to say, "With 95% confidence, 75% of customers will use this feature to implement it."

I lose all confidence intervals, confidence levels and sample sizes etc. Can someone help me gain the trust step by step and explain what I can count from these numbers (1500 premium users at all, 350 users have seen the feature, 265 users have used the feature.

## Probability – What exactly is an Ito integral?

When the Ito integral is constructed for simple processes, there is a natural intuition about "betting strategies" that can only change at discrete times.

If we now construct the Ito integral for a general continuous integrand, we approximate the integrand point by point $$L ^ 2$$ through simple processes and then take the $$L ^ 2$$ Limit of the corresponding integrals.

What does that mean? What exactly is one $$L ^ 2$$ Limit of simple integrals? And why $$L ^ 2$$ particularly?

## Probability – existence and uniqueness of a stationary measure

The same question was also asked on MSE https://math.stackexchange.com/questions/3327007/existence-and- uniqueness-of-a-stationary-measure.

Recently I asked the following question about MO Attractors in Random Dynamics.

To let $$Delta$$ be the interval $$(-1.1)$$Then we can look at the probability space $$( Delta, mathcal {B} ( delta), nu)$$, from where $$mathcal {B} ( Delta)$$ is the Borel $$sigma$$algebra and $$nu$$ is equal to half the Lebesgue measure.

Then we can equip the room $$Delta ^ { mathbb {N}}: = {( omega_n) _ {n in mathbb {N}}; \ omega_n in Delta \ forall n in mathbb {N} }$$ with the $$sigma$$-Algebra $$mathcal {B} ( Delta ^ { mathbb {N}})$$ (Borel $$sigma$$-algebra of $$Delta ^ { mathbb {N}}$$ induced by the product topology) and the probability measurement $$nu ^ { mathbb {N}}$$ in measurable space$$( Delta ^ { mathbb {N}}, mathcal {B} ( Delta ^ { mathbb {N}}))$$, so that
$$nu ^ { mathbb {N}} left A_1 times A_2 times ldots times A_n times prod_ {i = n + 1} ^ { infty} Delta right) = nu (A_1) cdot ldots cdot nu (A_n).$$

Now let it go $$sigma> 2 / (3 sqrt {3})$$ be a real number and define
$$x _- ^ * ( sigma) = text {The unique real root of the polynomial} x ^ 3+ sigma = x,$$
$$x _ + ^ * ( sigma) = text {The unique real root of the polynomial} x ^ 3- sigma = x,$$
that's easy to see $$x _ + ^ * ( sigma) = -x _- ^ * ( sigma)$$,

We can then define the function
$$h: mathbb {N} times Delta ^ mathbb {N} times (x _- ^ * ( sigma), x _ + ^ * ( sigma)) to (x _- ^ * ( sigma), x _ + ^ * ( sigma)),$$
in the following recursive way,

• $$h (0, ( omega_n) _ {n}, x) = x$$. $$forall ( omega_n) _n in mathbb {N}$$ and $$forall x in mathbb {R}$$;
• $$h (i + 1, ( omega_n) _ {n}, x) = sqrt (3) {h (i, ( omega_n) _ {n}, x) + sigma omega_i}.$$

That's how we are for everyone $$x in mathbb {R}$$ and $$( omega_n) _n in Delta ^ mathbb {N}$$Define the following order
$$left {x, sqrt (3) {x + sigma omega_1}, sqrt (3) { sqrt (3) {x + sigma omega_1} + sigma w_2}, sqrt ( 3) { sqrt (3) { sqrt (3) {x + sigma omega_1} + sigma w_2} + sigma w_3}, ldots right }.$$

Now define the following family of Markov kernels
$$P_n (x, A) = nu ^ { mathbb {N}} left ( left {( omega) _ {n in mathbb {N}} in Delta ^ { mathbb { N}}; h (n, ( omega_n) _ {n in mathbb {N}}, x) in A right } right).$$

A probability measure $$mu$$ in the $$((x _- ^ * ( sigma), x _ + ^ * ( sigma)), mathcal {B} ((x _- ^ * ( sigma), x _ + ^ * ( sigma) ))$$ is a called stationary measure, if

$$mu (A) = int _ {(x _- ^ * ( sigma), x _ + ^ * ( sigma))} P_1 (x, A) text {d} mu (x) ; forall A in mathcal {B} ((x _- ^ * ( sigma), x _ + ^ * ( sigma))),$$
from where $$mathcal {B} ((x _- ^ * ( sigma), x _ + ^ * ( sigma)))$$ is the Borel $$sigma$$-Algebra. Besides, once $$(x _- ^ * ( sigma), x _ + ^ * ( sigma))$$ it is easy to prove that there is at least one stakionäre measure.

The answer I received to MO suggests that there is only one stationary measure.

Does anyone know if that's true? An indication of such a result is sufficient for my purposes.

## How to count this conditional probability

How do you count to get the values ​​in the table?

## Probability Theory – Markov Cores for Continuous Processes?

From Wikipedia we define a Markov kernel as:

For cases where the state space is discrete, we can now obviously construct the rate transition matrix of the CTMC. In a more general setting (for example, if the state space is not discrete), if one is able to define a Markov kernel between the spaces of discrete times $${n, n + 1 }$$ , is this Markov kernel sufficient for all continuous time processes?

Minimal working example: Consider the process $$(X, A) _ {t}$$ from where $$X$$ is a Poisson process and $$A$$ is a timer that shows how long the Poisson process was in its state. It is clear that the state space has a form $$( mathbb {N}, (0, infty))$$

In my case, I have it (as above) so that in a very short time only a limited amount of things change over my non-discrete state space ($$(t, t + delta)$$) – If my kernel detects this, is it enough?

## Probability – concentration of the scaled \$ l_p \$ norm of a correlation matrix

Background:

Among the Hermitan random matrices, the correlation matrix finds a variety of applications in statistics. People have studied the "empirical spectral distribution (ESD)" of a correlation matrix, the largest eigenvalue of a correlation matrix, the logarithmic determinant of a correlation matrix, the largest non-diagonal entries of a correlation matrix, all with applications to statistics. However, there seems to be no work on scaling $$l_p$$ Norm of a correlation matrix. The $$l_p$$ The norm of a correlation matrix provides a criterion for whether the "strong law of large numbers (SLLN)", which is associated with the classical problem of normal means, is true.

Problem:

To let $$x_i, i = 1, ldots, n$$ His $$n$$ i.i.d. Observations of a $$m$$random vector $$x in mathbb {R} ^ m$$ so that $$x$$ has correlation matrix $$Sigma in mathbb {R} ^ {m times m}$$, To let $$R = (r_ {ij}) in mathbb {R} ^ {m times m}$$ let the correlation matrix be off $$left {x_i right } _ {i = 1} ^ n$$ so everyone $$r_ {ij}$$ is a Pearson correlation coefficient, and let the $$l_p$$ Norm of $$R$$ His $$Vert R Vert_p = sum_ {i, j = m} ^ p vert r_ {ij} vert ^ p$$ to the $$0 ,

Consider the case $$p = 1$$, Then $$Vert R Vert_1 ge m$$, Take this for some $$alpha> 0$$that scales $$l_p$$ standard $$m ^ {- alpha} Vert R Vert_1$$ almost certainly or with a likelihood that tends to $$1$$ lies in a compact interval, say, $$(a, b)$$, What can be said about it? $$Pr ( vert m ^ {- alpha} vert R vert_1 – mathbb {E} (m ^ {- alpha} vert R vert_1) vert> t)$$ for a solid $$t> 0$$ when $$m$$ is great (and $$n$$ satisfies a relationship with $$m$$), from where $$mathbb {E}$$ means expectation?

Specialization in Gauss $$x$$:

If $$x$$ followed by a Gaussian distribution, then every entry $$r_ {ij}$$has as Pearson correlation coefficient an explicit boundary distribution and the covariance between $$r_ {ij}$$ and $$r_ {jk}$$ is given by JH Steiger's article here (https://psycnet.apa.org/record/1980-08757-001). These results were also provided by Pearson and Filon (and Steiger's article cited these results). However, the notations of Pearson and Filon are very outdated (compared to modern notations) and difficult to digest.

to study $$m ^ {- alpha} Vert R Vert_1$$ and its concentration properties, and in particular to show the SLLN for $$m ^ {- alpha} Vert R Vert_1$$ We need the covariance between the absolute values ​​of $$r_ {ij}$$ and $$r_ {jk}$$we need $$mathsf {cov} ( vert r_ {ij} vert, vert r_ {jk} vert)$$ This then requires the joint distribution of $$r_ {ij}$$ and $$r_ {jk}$$I'm not able to find an existing formula (and its derivation will be very tedious and cumbersome).

Specialized questions and experiments:

Does anyone know the common distribution of $$r_ {ij}$$ and $$r_ {jk}$$? Attention: You should not try the common distribution of $$r_ {ij}$$ and $$r_ {jk}$$ using the distribution of $$R$$ when $$x$$ is Gausssch (though in this case $$R$$ has an analytical density that includes its determinant), because it causes more chaos than the direct calculation of covariance using a threefold normal distribution.

## Probability Distributions – Bound to the central moments of a \$ chi ^ 2_d \$ r.v.

To let $$d geq 1$$, and $$X$$ be a chi square r.v. With $$d$$ Degrees of freedom. For an integer $$k geq 1$$I would like to get a bond on the $$k$$-th central moment of $$X$$i.e.
$$mathbb {E} ((X-d) ^ k) leq C cdot (k!) ^ k tag {1}$$
for an absolute constant $$C> 0$$, I think I can prove it by combining properties of sub-exponential r.v.s with $$C = 1/4$$, Is there a more direct proof?

(Maybe I've made a mistake, in this case, the best bond that can be achieved in (1).) Also, I would still be glad if $$C$$ is replaced by $$C ^ k$$if the proof is simple.)