Latency
The delay between a word being spoken and the corresponding caption appearing on the audience's screen.
End-to-end latency in live captioning is the sum of audio capture latency, network transit to the transcription service, transcription processing time, and distribution back to viewer devices. The combined budget typically falls between 1.5 and 3 seconds for modern automated systems.
Latency above ~4 seconds breaks the perception of synchrony: the captions feel like they belong to the previous sentence, attendees stop reading them, and the value of live captioning is lost.
Hardware and audio routing decisions affect latency more than the captioning vendor. A clean wired audio feed from an AV soundboard produces lower latency than Bluetooth audio capture from a laptop microphone, which produces lower latency than re-streaming through a virtual mixer.