The time, in milliseconds, from the beginning of the audio stream to the end of the identified entity.