How to synthesize a vocal filler with Google cloud text-to-speech?


I am synthesizing some text into audio. The vocal fillers such as “Mmm…” become “M M M” and “Oooh” has an extra “h”. The Speech Synthesis Markup Language documentation does not mention vocal fillers, only pauses, which are silences.

How can I indicate a vocal filler in SSML with Google cloud text-to-speech?