Live captioning
Real-time text rendering of spoken audio, displayed alongside an event as it happens.
Live captioning produces text for spoken audio as the audio is being spoken — typically with a 1.5-3 second end-to-end latency between speech and on-screen text. The audience reads along while the speaker is still talking, rather than reading a transcript afterwards.
Live captioning serves multiple audiences: deaf and hard-of-hearing attendees, non-native-language speakers, attendees in noisy or shared environments, and increasingly, anyone who prefers reading-while-listening as a default mode of consumption.
Modern live captioning is delivered by Automatic Speech Recognition (ASR) systems, often supplemented with custom vocabulary per event for technical terms. Human-captioner workflows (CART) remain in use for high-stakes content where accuracy must exceed automated capabilities.