Why putting captions on the main room display is usually a mistake
There's a temptation to project captions onto the main room display alongside the speaker's slides, so in-room attendees can read along without pulling out their phones. This usually doesn't work as well as it sounds: it adds visual clutter to the slide deck, the captions become a single shared rendering that can't be customized per attendee (font size, color, language), and the captioning latency means the on-screen captions are always slightly behind the speaker, which is more distracting at projector scale than at phone scale.
The pattern that works better is per-device captioning: every attendee in the room pulls captions up on their phone or tablet via QR code. They control their own font size, contrast, and language. Accessibility is per-attendee, not enforced for the whole room. This is also closer to how remote attendees experience the captions, which keeps the experience consistent across the two audiences.
The exception is small rooms with a clear accessibility need (a single deaf attendee in the front row, an event explicitly framed as captioned-default) where a dedicated room-side display can be useful — typically a separate monitor next to the speaker, not the main slide projection.
Audio routing in practice
The cleanest setup is a balanced line-level feed from the AV soundboard's matrix output, into a USB audio interface plugged into the broadcaster laptop. This gives you the speakers' audio without the room reverb, audience coughs, or HVAC. If the venue AV team is unfamiliar with this request, the simpler alternative is a 3.5mm tap from the headphone monitoring output of the soundboard — lower-fidelity but always available.
Things to avoid: capturing audio from the laptop's built-in microphone (picks up everything in the room except the speakers), using Bluetooth audio (latency adds up), or relying on the live stream's audio (introduces an extra latency hop and assumes the live stream is running cleanly).
If the event uses a virtual mixer like StreamYard or vMix, take the program audio output and route it the same way as a soundboard feed.
Q&A and audience mics
Hybrid events have two audience sources for Q&A: in-room (handheld or fixed audience mics) and remote (chat, raised-hand, or unmuted audio from the streaming platform). Both need to land in the captioned transcript or the post-event record will have gaps where audience questions used to be.
If audience mics are routed through the same AV mix as the speakers, captioning catches them automatically. If audience mics are on a separate sub-mix, route that sub-mix to the broadcaster as well.
For remote audience questions, two patterns work: (1) the host repeats the remote question out loud before answering — captures it in the captioned audio, sounds natural, low overhead; (2) a moderator types the remote question into the captioning platform's chat or live editor, which inserts it into the transcript with a [Remote question] label. Pattern 1 is the default; pattern 2 is for events with high remote question volume where repeating each one is cumbersome.