EngineeringApril 28, 2026

Why per-device captioning beats room displays

It seems obvious to project captions onto the main screen alongside the slides. We tried it, and the result was worse than putting captions on every attendee's phone. Here's why.

EventRecast Team

Design

Early in the design process, we built a captioned-room demo: speaker on stage, slides on the projector, live captions rendered as a strip below the slides. It was the obvious thing. Every attendee in the room could read along without doing anything.

Then we shipped it to a real conference and watched the result. Three things went wrong, all at once, and they convinced us that the obvious answer was the wrong one. We pulled it the next quarter and committed entirely to per-device captioning. This is what we learned.

Problem one: the captions visibly lag the speaker

Real-time captioning has 1.5–3 second end-to-end latency. On a phone screen sitting in your hand, that latency is invisible — your brain treats the captions as part of the speaker's voice. On a 30-foot projector screen at the front of the room, the same latency is glaring. The speaker says a word, the audience hears it, and the caption appears two seconds later, in 24-inch type, where everyone is looking.

It pulled focus. Attendees stopped watching the speaker and started watching the caption strip waiting for the lag. That's the opposite of what captions are supposed to do.

Problem two: one rendering can't serve everyone

An attendee with low vision wants captions in 32-point text. An attendee with no visual impairment wants captions in 14-point so they don't dominate the slides. An attendee from Brazil wants Portuguese. An attendee from Japan wants Japanese. An attendee with light sensitivity wants high-contrast white-on-black. An attendee in normal lighting wants subtle gray-on-cream.

A room display is a single rendering. Whatever you pick fails most of the room. Per-device captioning lets every attendee make their own decisions — and accessibility teams stop having to defend a single compromise that nobody loves.

Problem three: it broke the in-room / remote symmetry

Hybrid events have two audiences sharing one program. If the in-room audience reads captions on the main display and the remote audience reads them on a viewer page, you have two different experiences. Speaker eye contact, attention patterns, the social dynamics of the room — all of these subtly diverge.

Per-device captioning collapses that gap. Every attendee, in-room or remote, opens the same viewer URL. The shared frame isn't 'we both watched the same screen' — it's 'we both read the same captions on our own device.' That sounds smaller. It isn't.

What we replaced it with

QR codes printed on room signage. Embed the viewer URL in the conference app's session detail. Show the QR code on the speaker's intro slide for ten seconds. Attendees pull captions up on the device they already have open. The marginal effort is twenty seconds; the experience is dramatically better.

For events where someone genuinely needs a fixed display — a small room with one deaf attendee in the front, an event explicitly framed around captioned-default — a dedicated room-side monitor next to the speaker still works. The pattern is 'optional supplemental display,' not 'caption strip on the main projector.'

What this means for the product

EventRecast's viewer page is the canonical caption surface, and it's mobile-first. Every layout decision optimizes for someone reading on a phone in their hand: large tap targets for bookmark and share, generous line height for skim-reading, persistent language picker, contrast controls accessible without leaving the page.

We don't ship a 'project to the main screen' mode. The viewer page is the screen, and every attendee gets one. That's the design.

designaccessibilitycaptionsux

Why per-device captioning beats room displays

Problem one: the captions visibly lag the speaker

Problem two: one rendering can't serve everyone

Problem three: it broke the in-room / remote symmetry

What we replaced it with

What this means for the product

Try EventRecast on a real event

Related

Live captioning for hybrid events

Audience engagement

Introducing EventRecast