AI Dubbing
Dub videos into different languages or replace speech with custom text using voice cloning.
Local Pipeline
Video dubbing runs with a local pipeline combining Whisper, translation models, XTTS, and Demucs.
VideoDubber
Main class for video dubbing and voice revoicing.
Basic Dubbing
Translate speech to another language while preserving the original speaker's voice:
from videopython.ai.dubbing import VideoDubber
from videopython.base import Video
video = Video.from_path("video.mp4")
dubber = VideoDubber()
# Dub to Spanish with voice cloning
result = dubber.dub(
video=video,
target_lang="es",
source_lang="en",
preserve_background=True, # Keep music and sound effects
voice_clone=True, # Clone original speaker's voice
)
# Save dubbed video
dubbed_video = video.add_audio(result.dubbed_audio, overlay=False)
dubbed_video.save("dubbed_video.mp4")
# Or use convenience method
dubbed_video = dubber.dub_and_replace(video, target_lang="es")
dubbed_video.save("dubbed_video.mp4")
Voice Revoicing
Replace speech with completely different text using the original speaker's voice:
from videopython.ai.dubbing import VideoDubber
from videopython.base import Video
video = Video.from_path("video.mp4")
dubber = VideoDubber()
# Make the person say something different
result = dubber.revoice(
video=video,
text="Hello everyone! This is a completely different message.",
preserve_background=True,
)
print(f"Original duration: {result.original_duration:.1f}s")
print(f"New speech duration: {result.speech_duration:.1f}s")
# Save revoiced video (trimmed to speech length)
revoiced_video = dubber.revoice_and_replace(
video=video,
text="Hello everyone! This is a completely different message.",
)
revoiced_video.save("revoiced_video.mp4")
Progress Tracking
def on_progress(stage: str, progress: float) -> None:
print(f"[{progress*100:5.1f}%] {stage}")
result = dubber.dub(
video=video,
target_lang="es",
progress_callback=on_progress,
)
VideoDubber
Dubs videos into different languages using the local pipeline.
Source code in src/videopython/ai/dubbing/dubber.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
dub
dub(
video: Video,
target_lang: str,
source_lang: str | None = None,
preserve_background: bool = True,
voice_clone: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> DubbingResult
Dub a video into a target language.
Source code in src/videopython/ai/dubbing/dubber.py
dub_and_replace
dub_and_replace(
video: Video,
target_lang: str,
source_lang: str | None = None,
preserve_background: bool = True,
voice_clone: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> Video
Dub a video and return a new video with the dubbed audio.
Source code in src/videopython/ai/dubbing/dubber.py
revoice
revoice(
video: Video,
text: str,
preserve_background: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> RevoiceResult
Replace speech in a video with new text using voice cloning.
Source code in src/videopython/ai/dubbing/dubber.py
revoice_and_replace
revoice_and_replace(
video: Video,
text: str,
preserve_background: bool = True,
progress_callback: Callable[[str, float], None]
| None = None,
) -> Video
Revoice a video and return a new video with the revoiced audio.
Source code in src/videopython/ai/dubbing/dubber.py
DubbingResult
Result of a dubbing operation containing the dubbed audio and metadata.
result = dubber.dub(video, target_lang="es")
print(f"Translated {result.num_segments} segments")
print(f"Source language: {result.source_lang}")
print(f"Target language: {result.target_lang}")
# Access translated segments
for segment in result.translated_segments:
print(f"'{segment.original_text}' -> '{segment.translated_text}'")
# Access voice samples used for cloning
for speaker, sample in result.voice_samples.items():
print(f"{speaker}: {sample.metadata.duration_seconds:.1f}s sample")
DubbingResult
dataclass
Result of a video dubbing operation.
Attributes:
| Name | Type | Description |
|---|---|---|
dubbed_audio |
Audio
|
The final dubbed audio track. |
translated_segments |
list[TranslatedSegment]
|
List of translated segments with timing. |
source_transcription |
Transcription
|
Original transcription of the source audio. |
source_lang |
str
|
Detected or specified source language. |
target_lang |
str
|
Target language for dubbing. |
separated_audio |
SeparatedAudio | None
|
Separated audio components (if preserve_background=True). |
voice_samples |
dict[str, Audio]
|
Dictionary mapping speaker IDs to voice sample Audio. |
Source code in src/videopython/ai/dubbing/models.py
get_segments_by_speaker
Group translated segments by speaker.
Returns:
| Type | Description |
|---|---|
dict[str, list[TranslatedSegment]]
|
Dictionary mapping speaker IDs to their segments. |
Source code in src/videopython/ai/dubbing/models.py
RevoiceResult
Result of a revoicing operation.
result = dubber.revoice(video, text="New message here")
print(f"Text: {result.text}")
print(f"Speech duration: {result.speech_duration:.1f}s")
print(f"Voice sample: {result.voice_sample.metadata.duration_seconds:.1f}s")
RevoiceResult
dataclass
Result of a voice replacement operation.
Attributes:
| Name | Type | Description |
|---|---|---|
revoiced_audio |
Audio
|
The final audio with new speech. |
text |
str
|
The text that was spoken. |
separated_audio |
SeparatedAudio | None
|
Separated audio components (if preserve_background=True). |
voice_sample |
Audio | None
|
Voice sample used for cloning. |
original_duration |
float
|
Duration of the original audio. |
speech_duration |
float
|
Duration of the generated speech. |
Source code in src/videopython/ai/dubbing/models.py
TranslatedSegment
Individual translated speech segment with timing information.
TranslatedSegment
dataclass
A segment of translated text with timing information.
Attributes:
| Name | Type | Description |
|---|---|---|
original_segment |
TranscriptionSegment
|
The original transcription segment. |
translated_text |
str
|
The translated text. |
source_lang |
str
|
Source language code (e.g., "en"). |
target_lang |
str
|
Target language code (e.g., "es"). |
speaker |
str | None
|
Speaker identifier if available. |
start |
float
|
Start time in seconds. |
end |
float
|
End time in seconds. |
Source code in src/videopython/ai/dubbing/models.py
__post_init__
Set timing from original segment if not provided.
Source code in src/videopython/ai/dubbing/models.py
SeparatedAudio
Audio separated into vocals and background components.
SeparatedAudio
dataclass
Audio separated into different components.
Attributes:
| Name | Type | Description |
|---|---|---|
vocals |
Audio
|
Isolated vocal/speech track. |
background |
Audio
|
Combined background audio (music + effects). |
music |
Audio | None
|
Isolated music track (if available). |
effects |
Audio | None
|
Isolated sound effects track (if available). |
original |
Audio
|
The original unseparated audio. |
Source code in src/videopython/ai/dubbing/models.py
Supported Languages
Get the list of supported languages:
languages = VideoDubber.get_supported_languages()
# {'en': 'English', 'es': 'Spanish', 'fr': 'French', ...}
Supported languages include: English, Spanish, French, German, Italian, Portuguese, Polish, Hindi, Arabic, Czech, Danish, Dutch, Finnish, Greek, Hebrew, Indonesian, Japanese, Korean, Malay, Norwegian, Romanian, Russian, Slovak, Swedish, Tamil, Thai, Turkish, Ukrainian, Vietnamese, Chinese.