Diarization

Also known as: speaker diarization

The process of identifying which speaker is talking at any given moment — answering 'who said what' in a multi-speaker recording.

Speaker diarization analyzes acoustic features (voice pitch, timbre, speaking rhythm) to segment audio by speaker. The output of diarization is typically labels like 'Speaker 1', 'Speaker 2' attached to each spoken segment, which can then be mapped to actual names in post-processing.

For live captioning, diarization enables speaker labels in transcripts — making panels, interviews, and Q&A sessions readable rather than presenting as undifferentiated text.

Modern ASR systems often include built-in diarization. Quality depends on audio segmentation: crisp audio with separate microphones per speaker produces excellent results; mixed audio from a single room mic produces noisier results.

Related terms

Speaker labels
ASR
Live captioning

Related terms

See live captioning in action