Skip to main content

Reducing Perceived Latency of the Translated Speech

As soon as Palabra Broadcaster receives your stream, it starts the translation pipeline and immediately begins re‑streaming your original stream.

Our translation pipeline typically needs about 3–7 seconds before it can output translated speech (TTS). By default, this creates a 3–7 second gap between the original speech and the translated speech.

Default Broadcaster Behavior

Output:
- Original video and audio start playing as soon as they arrive.
- Translated audio starts playing about 3–7 seconds later.

Dubbing-effect

You can add a configurable delay to the outgoing audio and video to keep them synchronized with the translated speech.

Broadcaster delays the original media by the duration you specify for original_delay_seconds setting. Translated speech is played at the moment it was originally spoken, even if the TTS audio becomes ready earlier. If the TTS is not ready before the delay expires, it is mixed into the translated audio track as soon as it becomes available.

  curl -X POST 'https://api.palabra.ai/broadcasts' \
-H "ClientID: $API_CLIENT_ID" \
-H "ClientSecret: $API_CLIENT_SECRET" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
--data-raw '{
"title": "My Stream",
// ...
"original_delay_seconds": 3,
}'

Broadcaster Behavior original_delay_seconds: 3

Output:
- Original video and audio are delayed by 3 seconds before playback.
- Translated audio now plays roughly 0–4 seconds after the original instead of 3–7 seconds behind it.

This reduces the perceived gap between the origina speech and the translation and can improve the listening experience.

However, note that the original video is now delayed by the same amount (3 seconds in this example). For most online viewers watching via a player, this extra delay is usually acceptable and often even not noticeable.

For offline events, where some viewers see the stage in person (in the room) and listen to audio from Broadcaster, this shift can be confusing because they will hear the original speech from Broadcaster with a delay compared to what they see.


Guidelines

  • For online events where all viewers watch through a stream, you can safely use original_delay_seconds (we recommend values between 3 and 8 seconds; around 8 seconds gives a strong dubbing‑style effect).
  • For offline events where some viewers see the original event in person, set original_delay_seconds to 0 so they hear the original without extra delay.
  • Remember that original_delay_seconds affects only audio and video. Captions arrive from the Centrifuge server without extra delay (as soon as they are available). If you need captions to appear in sync with the shifted audio/video, add an additional display delay on the client side.

Note: Palabra Broadcaster starts re‑streaming your original stream as soon as it receives it, but end‑to‑end delay also depends on the input/output protocols. For offline events, using WebRTC for both input and output is recommended to keep protocol‑related latency as close to zero as possible.