information theory – Shared randomness does not increase capacity of a noisy channel – Why?

Why is it the case that when Alice and Bob use a noisy channel for communication, the capacity of the channel does not increase even if they are allowed to share pre-distributed randomness?

This is mentioned in some notes (see paragraph before Section 4 of https://cds.cern.ch/record/613098/files/0304102.pdf) but I have not seen a proof or an intuitive argument for it yet. Any reference to where this is covered would also be appreciated!