information theory – Shared randomness does not increase capacity of a noisy channel – Why?

Why is it the case that when Alice and Bob use a noisy channel for communication, the capacity of the channel does not increase even if they are allowed to share pre-distributed randomness?

This is mentioned in some notes (see paragraph before Section 4 of but I have not seen a proof or an intuitive argument for it yet. Any reference to where this is covered would also be appreciated!