The time, in milliseconds, from the beginning of the audio stream to the start of the UtteranceEvent.
UtteranceEvent