AI Dubbing
Dub videos into different languages or replace speech with custom text using voice cloning.
Local Pipeline
Video dubbing runs with a local pipeline combining Whisper for transcription, MarianMT or Qwen3 for translation, Chatterbox Multilingual TTS for speech synthesis, and Demucs for source separation. Translation backend selection is automatic by default — see Translation Backend for details.
VideoDubber
Main class for video dubbing and voice revoicing.
Basic Dubbing
Translate speech to another language while preserving the original speaker's voice:
from videopython.ai.dubbing import VideoDubber
from videopython.base import Video
video = Video.from_path("video.mp4")
dubber = VideoDubber()
# Dub to Spanish with voice cloning
result = dubber.dub(
video=video,
target_lang="es",
source_lang="en",
preserve_background=True, # Keep music and sound effects
voice_clone=True, # Clone original speaker's voice
)
# Save dubbed video
dubbed_video = video.add_audio(result.dubbed_audio, overlay=False)
dubbed_video.save("dubbed_video.mp4")
# Or use convenience method
dubbed_video = dubber.dub_and_replace(video, target_lang="es")
dubbed_video.save("dubbed_video.mp4")
Voice Revoicing
Replace speech with completely different text using the original speaker's voice:
from videopython.ai.dubbing import VideoDubber
from videopython.base import Video
video = Video.from_path("video.mp4")
dubber = VideoDubber()
# Make the person say something different
result = dubber.revoice(
video=video,
text="Hello everyone! This is a completely different message.",
preserve_background=True,
)
print(f"Original duration: {result.original_duration:.1f}s")
print(f"New speech duration: {result.speech_duration:.1f}s")
# Save revoiced video (trimmed to speech length)
revoiced_video = dubber.revoice_and_replace(
video=video,
text="Hello everyone! This is a completely different message.",
)
revoiced_video.save("revoiced_video.mp4")
Progress Tracking
def on_progress(stage: str, progress: float) -> None:
print(f"[{progress*100:5.1f}%] {stage}")
result = dubber.dub(
video=video,
target_lang="es",
progress_callback=on_progress,
)
Memory-Efficient Dubbing
The default pipeline keeps all four models (Whisper, Demucs, the translation
backend, Chatterbox) resident in memory and operates on Video objects that
hold every frame in RAM.
For long or high-resolution sources — or memory-constrained hardware — two flags
trade a modest amount of latency for a much lower memory ceiling.
Unload models between stages with low_memory=True:
# Each stage's model is released after it runs, so only one is resident at a time.
# Recommended for GPUs with <=12GB VRAM or hosts with <32GB RAM.
dubber = VideoDubber(low_memory=True)
dubbed_video = dubber.dub_and_replace(video, target_lang="es")
Skip frame loading with dub_file():
# Operates on file paths; extracts audio via ffmpeg, runs the pipeline on the
# audio only, and muxes the dubbed audio back into the source video using
# ffmpeg stream-copy (no video re-encode). Peak memory is bounded by model
# weights and the audio track, independent of video length and resolution.
dubber = VideoDubber(low_memory=True)
result = dubber.dub_file(
input_path="long_video.mp4",
output_path="dubbed.mp4",
target_lang="es",
)
Use dub_file() when you don't need frame-level access in Python. Combine with
low_memory=True for the smallest memory footprint. See
Processing Large Videos
for a worked example.
Whisper Model Selection
Pick the Whisper model size used for transcription. Larger models are more
accurate but use more VRAM and run slower. Default is turbo — large-v3
quality at ~8x the speed of large (and ~2x faster than small), so the
out-of-the-box dubbing path is now both more accurate and faster.
# Even higher accuracy on very noisy or heavily accented audio
dubber = VideoDubber(whisper_model="large")
# Lower VRAM footprint for short clips
dubber = VideoDubber(whisper_model="tiny")
Supported sizes: tiny, base, small, medium, large, turbo.
Anti-Hallucination Knobs
VideoDubber forwards three Whisper decoder kwargs to AudioToText so dubbing
inherits the same defaults — most importantly condition_on_previous_text=False,
which prevents a single hallucinated filler from cascading through the whole
dubbed track on noisy or sparse-speech inputs.
# Defaults already protect against the cascading-hallucination failure mode.
dubber = VideoDubber()
# Tighter no-speech gate for a film with heavy ambient music.
dubber = VideoDubber(no_speech_threshold=0.85)
See AudioToText for the full rationale.
Brand-Name Vocabulary
Pass a list of brand names, product names, or proper nouns that may appear in
the source audio. The list is forwarded to AudioToText and biases Whisper's
first-window decoder via initial_prompt, recovering near-mishears (e.g.
Klarna → "carna") on brand-monitoring inputs.
See Brand-name vocabulary biasing for normalization rules and the token-budget guard.
Translation Backend
Two translation backends ship with the dubbing pipeline:
- MarianMT (Helsinki-NLP) — fast on CPU, segment-isolated translation. Covers ~30 high-resource language pairs out of the box.
- Qwen3 — Qwen3-4B-Instruct via
llama-cpp-python(Q4_K_M GGUF, ~2.4 GB, downloaded on first use). Context-aware: prompts include a per-segment character budget derived from source duration and alow_confidencehint sourced from Whisperavg_logprob. Per-segment fallback to Marian if Qwen parse-retries both fail and the language pair is supported.
# Auto resolver: Qwen3 on GPU when supported, MarianMT on CPU.
dubber = VideoDubber(translator="auto")
# Force MarianMT (e.g. CPU machines where Qwen3 wall time is impractical).
dubber = VideoDubber(translator="marian")
# Force Qwen3. Logs a WARNING on CPU because Qwen3-4B Q4_K_M runs ~10-15x
# slower than Marian without GPU acceleration.
dubber = VideoDubber(translator="qwen3")
Hard failures from Qwen3 (both the primary call and the per-segment Marian
fallback fail) are surfaced on DubbingResult.translation_failures as a list
of segment indices; those segments land on the result with empty translated
text. Empty list under MarianTranslator.
If neither backend covers the requested pair the auto resolver raises
UnsupportedLanguageError (importable from videopython.ai.dubbing).
Output Options for dub_file
dub_file() writes the dubbed video by stream-copying the source video and
muxing the new audio. Two extras carry through automatically and one is opt-in:
- Subtitles pass-through (automatic). Subtitle streams from the source video are stream-copied into the output by default. Sources without subtitles are tolerated.
- Source loudness match (automatic). The dubbed audio is gain-matched to
the source via BS.1770 integrated-loudness measurement (
pyloudnorm, BSD-3) so the dub lands within ~1 LU of the source on dialogue-heavy mixes. Falls back to peak-amplitude match for clips shorter than 400 ms; post-gain peaks are clamped to 0.99. keep_original_audio=True(opt-in). Retains the source audio as a secondary audio track behind the dubbed one. Useful for editorial A/B; the dubbed track stays the default-playback track.
result = dubber.dub_file(
input_path="interview.mp4",
output_path="interview_es.mp4",
target_lang="es",
keep_original_audio=True, # source audio rides along as track #2
)
Transcript Quality Gating
Even with condition_on_previous_text=False, sufficiently degenerate input
(ambient music, mostly-silent windows misread as speech) can still produce
unusable transcripts. The pipeline runs a cheap heuristic over the Whisper
output and exposes the assessment on every result.
Three checks fire flags:
- Dominant phrase — one phrase covers ≥70% of segment characters
(catches cascades like the Japanese YouTube outro
「ご視聴ありがとうございました」). - Low decoder confidence — median
avg_logprob<-1.5. - Sparse speech — speech-region duration is <5% of clip duration on inputs >30s.
The recommendation is "reject" when the dominance flag fires together
with at least one other flag, "warn" when any single flag fires, "ok"
otherwise. Single repetition alone (chants, song lyrics) only warns.
result = dubber.dub(video, target_lang="es")
q = result.transcript_quality
if q is not None:
print(q.recommendation) # "ok" | "warn" | "reject"
print(q.dominant_phrase_fraction) # 0.0-1.0
print(q.flags) # ["dominant_phrase", ...]
Use strict_quality=True to refuse low-quality transcripts before paying for
Demucs, translation, and TTS:
from videopython.ai.dubbing import GarbageTranscriptError
dubber = VideoDubber(strict_quality=True)
try:
dubber.dub(video, target_lang="es")
except GarbageTranscriptError as exc:
print("Refused:", exc.quality.flags)
Timing Summary
DubbingResult.timing_summary aggregates the per-segment timing adjustments
the synchronizer applied to fit translated speech into source durations. High
truncation rates indicate translation produced text that was too long for the
source's spoken regions — a quality red flag worth surfacing in eval harnesses
or product UI.
result = dubber.dub(video, target_lang="es")
ts = result.timing_summary
if ts is not None:
print(f"{ts.clean_count}/{ts.total_segments} clean")
print(f"{ts.truncated_count} truncated, worst {ts.max_truncation_seconds:.2f}s")
print(f"mean speed factor {ts.mean_speed_factor:.3f}")
Source-Prosody Expressiveness
ChatterboxMultilingualTTS.generate() exposes exaggeration, cfg_weight,
and temperature knobs. The dubbing pipeline derives an Expressiveness
profile per segment from source vocals RMS (relative to whole-vocals baseline)
and forwards it to Chatterbox, so the dub tracks the source's loud/quiet shape
instead of using flat defaults on every segment.
Three buckets, picked by-ear on cam1_1min.mp4:
| RMS ratio vs baseline | exaggeration |
cfg_weight |
|---|---|---|
< 0.7× (calm) |
0.3 |
0.7 |
0.7×–1.3× (normal) |
Chatterbox default | Chatterbox default |
> 1.3× (dramatic) |
0.85 |
0.35 |
The Expressiveness dataclass is exported from videopython.ai.dubbing.
Supplying a Pre-Computed Transcription
dub(), dub_and_replace(), and dub_file() accept an optional transcription
argument. Pass a pre-computed Transcription to skip the internal Whisper step
— useful when you've already transcribed (and possibly hand-edited) the source.
Per-speaker voice cloning is driven by speaker labels on the supplied transcription. Three cases:
| Supplied transcription | enable_diarization |
Behavior |
|---|---|---|
| Has speaker labels | any | Use supplied speakers; enable_diarization ignored |
| No speakers | True |
Run pyannote on the audio, attach speakers to supplied words |
| No speakers | False |
Use as-is; all segments share a single voice clone |
The diarize-on-supplied path requires word-level timings on the supplied transcription — transcriptions loaded from SRT (one synthetic word per block) are rejected.
# Workflow: transcribe, edit, then dub with per-speaker cloning
from videopython.ai.dubbing import VideoDubber
from videopython.ai.understanding.audio import AudioToText
from videopython.base import Video
video = Video.from_path("video.mp4")
# 1. Transcribe with diarization
transcriber = AudioToText(enable_diarization=True)
transcription = transcriber.transcribe(video)
# 2. Edit segment text in-place (correct misrecognitions, etc.)
for seg in transcription.segments:
if "incorrect word" in seg.text:
seg.text = seg.text.replace("incorrect word", "correct word")
# 3. Dub using the edited transcription. Speaker labels from step 1 are
# preserved, so each speaker gets their own cloned voice.
dubber = VideoDubber()
dubbed_video = dubber.dub_and_replace(
video=video,
target_lang="es",
transcription=transcription,
)
If you have a transcription without speakers and want per-speaker cloning, pass
enable_diarization=True — pyannote will run standalone (skipping the Whisper
re-transcription).
VideoDubber
Dubs videos into different languages using the local pipeline.
Accepts either a :class:DubbingConfig or the same knobs as flat kwargs
(device, low_memory, whisper_model, translator, etc.) --
the flat path builds a DubbingConfig internally. See
:class:DubbingConfig for the full knob list and defaults.
Source code in src/videopython/ai/dubbing/dubber.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
dub
dub(
video: Video,
target_lang: str,
source_lang: str | None = None,
preserve_background: bool = True,
voice_clone: bool = True,
enable_diarization: bool = False,
progress_callback: Callable[[str, float], None]
| None = None,
transcription: Any = None,
) -> DubbingResult
Dub a video into a target language.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enable_diarization
|
bool
|
Enable speaker diarization to clone each speaker's
voice separately. With |
False
|
transcription
|
Any
|
Optional pre-computed |
None
|
Source code in src/videopython/ai/dubbing/dubber.py
dub_and_replace
dub_and_replace(
video: Video,
target_lang: str,
source_lang: str | None = None,
preserve_background: bool = True,
voice_clone: bool = True,
enable_diarization: bool = False,
progress_callback: Callable[[str, float], None]
| None = None,
transcription: Any = None,
) -> Video
Dub a video and return a new video with the dubbed audio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transcription
|
Any
|
Optional pre-computed |
None
|
Source code in src/videopython/ai/dubbing/dubber.py
dub_file
dub_file(
input_path: str | Path,
output_path: str | Path,
target_lang: str,
source_lang: str | None = None,
preserve_background: bool = True,
voice_clone: bool = True,
enable_diarization: bool = False,
progress_callback: Callable[[str, float], None]
| None = None,
transcription: Any = None,
keep_original_audio: bool = False,
) -> DubbingResult
Dub a video file in place on disk without loading video frames into memory.
Extracts the audio track via ffmpeg, runs the dubbing pipeline on the audio only, then muxes the dubbed audio back into the source video using ffmpeg stream-copy (no video re-encode). Peak memory is bounded by model weights and the audio track — independent of video length and resolution.
Use this instead of dub_and_replace when the source video is long
or high-resolution and you don't need frame-level access in Python.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
str | Path
|
Path to the source video file. |
required |
output_path
|
str | Path
|
Path to write the dubbed video. Overwritten if it exists. |
required |
target_lang
|
str
|
Target language code (e.g. |
required |
source_lang
|
str | None
|
Source language code, or |
None
|
preserve_background
|
bool
|
Preserve background music/effects via source separation. |
True
|
voice_clone
|
bool
|
Clone the source speaker's voice for the dubbed track. |
True
|
enable_diarization
|
bool
|
Enable speaker diarization for per-speaker voice cloning.
See |
False
|
progress_callback
|
Callable[[str, float], None] | None
|
Optional callback |
None
|
transcription
|
Any
|
Optional pre-computed |
None
|
keep_original_audio
|
bool
|
If True, retain the source audio in the output as a secondary track behind the dubbed one (editorial A/B). |
False
|
Returns:
| Type | Description |
|---|---|
DubbingResult
|
|
DubbingResult
|
source transcription. The output video is written to |
Source code in src/videopython/ai/dubbing/dubber.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
revoice
revoice(
video: Video,
text: str,
preserve_background: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> RevoiceResult
Replace speech in a video with new text using voice cloning.
Source code in src/videopython/ai/dubbing/dubber.py
revoice_and_replace
revoice_and_replace(
video: Video,
text: str,
preserve_background: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> Video
Revoice a video and return a new video with the revoiced audio.
Source code in src/videopython/ai/dubbing/dubber.py
DubbingConfig
Knobs shared by VideoDubber and LocalDubbingPipeline. Accept either
config=DubbingConfig(...) or pass the same knobs as flat kwargs — the
constructor builds a DubbingConfig internally.
from videopython.ai.dubbing import DubbingConfig, VideoDubber
# Flat kwargs (recommended for ad-hoc calls)
dubber = VideoDubber(device="cuda", low_memory=True, whisper_model="large")
# Explicit config (recommended for reusable presets)
config = DubbingConfig(
device="cuda",
low_memory=True,
whisper_model="large",
translator="qwen3",
vocabulary=["Klarna", "Allegro"],
)
dubber = VideoDubber(config=config)
DubbingConfig
Bases: BaseModel
Knobs shared by :class:VideoDubber and :class:LocalDubbingPipeline.
Accepted as either config=DubbingConfig(...) or flat kwargs on the
two constructors; the flat path builds a DubbingConfig internally.
Attributes:
| Name | Type | Description |
|---|---|---|
device |
str | None
|
Execution device ( |
low_memory |
bool
|
When True, each pipeline stage (Whisper, Demucs, MarianMT, Chatterbox TTS) is unloaded from memory after it runs, so only one model is resident at a time. Trades per-run latency (~10-30s of extra model loads) for a much lower memory ceiling. Recommended for GPUs with <=12GB VRAM or hosts with <32GB RAM. Default False. |
whisper_model |
WhisperModel
|
Whisper model size used for transcription. Larger
models give better accuracy at the cost of VRAM and latency. One
of |
condition_on_previous_text |
bool
|
Forwarded to |
no_speech_threshold |
float
|
Forwarded to |
logprob_threshold |
float | None
|
Forwarded to |
vocabulary |
list[str] | None
|
Forwarded to |
strict_quality |
bool
|
When True, the pipeline raises
:class: |
translator |
TranslatorChoice
|
Translation backend to use. |
Source code in src/videopython/ai/dubbing/config.py
init_log_fields
Subset of fields surfaced in the init-log line.
Hand-picked so log noise stays bounded as the config grows.
Source code in src/videopython/ai/dubbing/config.py
DubbingResult
Result of a dubbing operation containing the dubbed audio and metadata.
result = dubber.dub(video, target_lang="es")
print(f"Translated {result.num_segments} segments")
print(f"Source language: {result.source_lang}")
print(f"Target language: {result.target_lang}")
# Access translated segments
for segment in result.translated_segments:
print(f"'{segment.original_text}' -> '{segment.translated_text}'")
# Access voice samples used for cloning
for speaker, sample in result.voice_samples.items():
print(f"{speaker}: {sample.metadata.duration_seconds:.1f}s sample")
DubbingResult
Bases: BaseModel
Result of a video dubbing operation.
Attributes:
| Name | Type | Description |
|---|---|---|
dubbed_audio |
Audio
|
The final dubbed audio track. |
translated_segments |
list[TranslatedSegment]
|
List of translated segments with timing. |
source_transcription |
Transcription
|
Original transcription of the source audio. |
source_lang |
str
|
Detected or specified source language. |
target_lang |
str
|
Target language for dubbing. |
separated_audio |
SeparatedAudio | None
|
Separated audio components (if preserve_background=True). |
voice_samples |
dict[str, Audio]
|
Dictionary mapping speaker IDs to voice sample Audio. |
timing_summary |
TimingSummary | None
|
Aggregate stats over per-segment timing adjustments. |
transcript_quality |
TranscriptQuality | None
|
Heuristic quality assessment of the transcription (None when the pipeline returned early on an empty transcription). |
translation_failures |
list[int]
|
Indices of segments where translation failed entirely. Used by Qwen3Translator when both the primary call and the per-segment Marian fallback fail; those segments are dubbed with empty text. Empty list under MarianTranslator (Marian has no failure mode that drops segments). |
Source code in src/videopython/ai/dubbing/models.py
get_segments_by_speaker
Group translated segments by speaker.
Returns:
| Type | Description |
|---|---|
dict[str, list[TranslatedSegment]]
|
Dictionary mapping speaker IDs to their segments. |
Source code in src/videopython/ai/dubbing/models.py
RevoiceResult
Result of a revoicing operation.
result = dubber.revoice(video, text="New message here")
print(f"Text: {result.text}")
print(f"Speech duration: {result.speech_duration:.1f}s")
print(f"Voice sample: {result.voice_sample.metadata.duration_seconds:.1f}s")
RevoiceResult
Bases: BaseModel
Result of a voice replacement operation.
Attributes:
| Name | Type | Description |
|---|---|---|
revoiced_audio |
Audio
|
The final audio with new speech. |
text |
str
|
The text that was spoken. |
separated_audio |
SeparatedAudio | None
|
Separated audio components (if preserve_background=True). |
voice_sample |
Audio | None
|
Voice sample used for cloning. |
original_duration |
float
|
Duration of the original audio. |
speech_duration |
float
|
Duration of the generated speech. |
Source code in src/videopython/ai/dubbing/models.py
TranslatedSegment
Individual translated speech segment with timing information.
TranslatedSegment
Bases: BaseModel
A segment of translated text with timing information.
Attributes:
| Name | Type | Description |
|---|---|---|
original_segment |
_TranscriptionSegmentField
|
The original transcription segment. |
translated_text |
str
|
The translated text. |
source_lang |
str
|
Source language code (e.g., "en"). |
target_lang |
str
|
Target language code (e.g., "es"). |
speaker |
str | None
|
Speaker identifier if available. |
start |
float
|
Start time in seconds. |
end |
float
|
End time in seconds. |
Source code in src/videopython/ai/dubbing/models.py
SeparatedAudio
Audio separated into vocals and background components.
SeparatedAudio
Bases: BaseModel
Audio separated into different components.
Attributes:
| Name | Type | Description |
|---|---|---|
vocals |
Audio
|
Isolated vocal/speech track. |
background |
Audio
|
Combined background audio (music + effects). |
music |
Audio | None
|
Isolated music track (if available). |
effects |
Audio | None
|
Isolated sound effects track (if available). |
original |
Audio
|
The original unseparated audio. |
Source code in src/videopython/ai/dubbing/models.py
Expressiveness
Per-segment Chatterbox generate() knobs (exaggeration, cfg_weight,
temperature). None on any field means "let Chatterbox use its default".
The dubbing pipeline derives this from source vocals RMS automatically; the
type is exposed for users who want to inspect or override per-segment values.
Expressiveness
Bases: BaseModel
Chatterbox generate() knobs derived from source-segment prosody.
None on any field means "let Chatterbox use its own default" --
avoids pinning the dub against future Chatterbox default changes.
Attributes:
| Name | Type | Description |
|---|---|---|
exaggeration |
float | None
|
Emotional intensity. Chatterbox default |
cfg_weight |
float | None
|
Classifier-free guidance weight. Chatterbox default
|
temperature |
float | None
|
Sampling temperature. Chatterbox default |
Source code in src/videopython/ai/dubbing/models.py
as_kwargs
Knobs as a dict, dropping None entries.
Suitable for **-expansion into Chatterbox.
Source code in src/videopython/ai/dubbing/models.py
TimingSummary
Aggregate stats over per-segment timing adjustments applied by the synchronizer. Surfaces truncation and speed-change counts that translation quality eval harnesses can compare across backends.
TimingSummary
Bases: BaseModel
Aggregate stats over per-segment timing adjustments.
Surfaces how aggressively the timing synchronizer had to compress or truncate dubbed segments to fit the source's spoken regions. High truncation rates indicate translation produced text too long for the source duration.
Source code in src/videopython/ai/dubbing/models.py
from_adjustments
classmethod
Aggregate a list of TimingAdjustments into a TimingSummary.
Source code in src/videopython/ai/dubbing/models.py
TranscriptQuality
Heuristic quality assessment over a Whisper transcription. Surfaced on every
DubbingResult; drives the optional strict_quality reject path.
TranscriptQuality
Bases: BaseModel
Quality assessment of a Whisper transcription.
Attributes:
| Name | Type | Description |
|---|---|---|
recommendation |
Recommendation
|
|
dominant_phrase |
str | None
|
The repeating phrase that triggered the dominance flag, or None when the flag didn't fire. |
dominant_phrase_fraction |
float
|
Character-count share of the most common normalized segment phrase. 0.0 when no segments. |
median_avg_logprob |
float | None
|
Median of |
speech_fraction |
float
|
Sum of segment durations divided by the audio's wall-clock duration. |
flags |
list[str]
|
Human-readable list of which checks fired. |
Source code in src/videopython/ai/dubbing/quality.py
GarbageTranscriptError
Raised by the pipeline when strict_quality=True and the transcript-quality
heuristic returns recommendation="reject". Carries the triggering
TranscriptQuality as error.quality for caller introspection.
GarbageTranscriptError
Bases: RuntimeError
Raised by the dubbing pipeline when strict_quality=True and the
transcript heuristic returns recommendation="reject".
The triggering :class:TranscriptQuality is attached as quality so
callers can introspect the flags without re-running the pipeline.
Source code in src/videopython/ai/dubbing/quality.py
UnsupportedLanguageError
Raised by the translator auto-resolver when neither MarianMT nor Qwen3 covers
the requested (source_lang, target_lang) pair. Carries both fields for
caller introspection without parsing the message.
UnsupportedLanguageError
Bases: ValueError
Raised when no available translation backend supports a given
(source, target) language pair.
Carries the requested pair so callers can introspect:
try:
dubber.dub(video, target_lang="xh")
except UnsupportedLanguageError as e:
print(f"No backend covers {e.source_lang}->{e.target_lang}")
Source code in src/videopython/ai/generation/translation.py
Supported Languages
Get the list of supported languages:
languages = VideoDubber.get_supported_languages()
# {'en': 'English', 'es': 'Spanish', 'fr': 'French', ...}
Supported languages include: English, Spanish, French, German, Italian, Portuguese, Polish, Hindi, Arabic, Czech, Danish, Dutch, Finnish, Greek, Hebrew, Indonesian, Japanese, Korean, Malay, Norwegian, Romanian, Russian, Slovak, Swedish, Tamil, Thai, Turkish, Ukrainian, Vietnamese, Chinese.