Processing Large Videos
Process videos that are too large to fit in memory using streaming APIs.
Goal
Analyze or process long videos (hours of footage) without running out of RAM.
The Problem
Loading a full video into memory can be expensive:
# This loads ALL frames into RAM - problematic for long videos
video = Video.from_path("2_hour_movie.mp4") # Could use 50GB+ RAM
A 2-hour video at 1080p30 has ~216,000 frames. At ~6MB per frame (uncompressed RGB), that's over 1TB of data.
Solution: Streaming Editing Pipeline
For editing workflows, VideoEdit.run_to_file() streams frames one at a time from
ffmpeg decode through per-frame effect processing to ffmpeg encode. Memory usage is
constant (~250 MB) regardless of video length.
from videopython.editing import VideoEdit
plan = {
"segments": [
{
"source": "2_hour_movie.mp4",
"start": 0,
"end": 7200,
"transforms": [
{"op": "resize", "args": {"width": 1920, "height": 1080}},
],
"effects": [
{"op": "color_adjust", "args": {"saturation": 0, "contrast": 1.15}},
{"op": "fade", "args": {"mode": "in_out", "duration": 1.0}},
{"op": "volume_adjust", "args": {"volume": 1.5}},
],
}
],
}
edit = VideoEdit.from_dict(plan)
edit.run_to_file("output.mp4", crf=20, preset="medium")
# Peak memory: ~250 MB regardless of video length
When all operations are streamable, frames are never loaded into memory. If any operation
is not streamable (e.g. reverse, speed_change), the pipeline falls back to eager
mode automatically.
Check VideoEdit.json_schema() for x-streamable: true on each operation to see which
ones support streaming.
Solution: FrameIterator
FrameIterator streams frames one at a time with O(1) memory usage:
from videopython.base import FrameIterator
# Process frames without loading entire video
with FrameIterator("long_video.mp4") as frames:
for frame_idx, frame in frames:
# frame is a numpy array (H, W, 3) in RGB
# Only one frame in memory at a time
process_frame(frame)
Full Example: Extract Thumbnails
Extract one thumbnail per minute from a long video:
from videopython.base import FrameIterator, VideoMetadata
from PIL import Image
import os
def extract_thumbnails(video_path: str, output_dir: str, interval_seconds: float = 60.0):
"""Extract thumbnails at regular intervals from a video."""
os.makedirs(output_dir, exist_ok=True)
# Get metadata without loading video
metadata = VideoMetadata.from_path(video_path)
fps = metadata.fps
interval_frames = int(interval_seconds * fps)
with FrameIterator(video_path) as frames:
for frame_idx, frame in frames:
if frame_idx % interval_frames == 0:
# Save thumbnail
timestamp = frame_idx / fps
img = Image.fromarray(frame)
img.thumbnail((320, 180)) # Resize to thumbnail
img.save(f"{output_dir}/thumb_{timestamp:.0f}s.jpg")
print(f"Saved thumbnail at {timestamp:.0f}s")
# Extract one thumbnail per minute
extract_thumbnails("2_hour_movie.mp4", "thumbnails/", interval_seconds=60)
Scene Detection on Large Videos
Use streaming scene detection for memory-efficient processing:
from videopython.base import SceneDetector
detector = SceneDetector(threshold=0.3, min_scene_length=1.0)
# Streaming: O(1) memory, processes frames one at a time
scenes = detector.detect_streaming("long_video.mp4")
# Or parallel: Faster on multi-core systems
scenes = detector.detect_parallel("long_video.mp4", num_workers=8)
for scene in scenes:
print(f"Scene: {scene.start:.1f}s - {scene.end:.1f}s")
AI Video Analysis (Scene-First)
VideoAnalyzer.analyze_path() returns scene-centered outputs:
from videopython.ai import VideoAnalyzer, VideoAnalysisConfig
config = VideoAnalysisConfig(
enabled_analyzers={"audio_to_text", "semantic_scene_detector", "scene_vlm"},
)
analysis = VideoAnalyzer(config=config).analyze_path("long_video.mp4")
for scene in (analysis.scenes.samples if analysis.scenes else []):
print(scene.scene_index, scene.start_second, scene.end_second)
for chunk in scene.visual_segments:
print(" ", chunk.start_second, chunk.end_second, chunk.caption)
Processing a Segment
Process only a portion of a large video:
from videopython.base import FrameIterator
# Only iterate frames from 1:00:00 to 1:10:00
with FrameIterator("movie.mp4", start_second=3600, end_second=4200) as frames:
for frame_idx, frame in frames:
process_frame(frame)
Dubbing Large Videos
VideoDubber.dub_and_replace() loads every frame into RAM via Video.from_path(),
which is impractical for long sources. dub_file() operates on paths instead:
it extracts audio via ffmpeg, runs the dubbing pipeline on the audio only, and
muxes the dubbed audio back into the source video using ffmpeg stream-copy
(no video re-encode). Peak memory is bounded by model weights and the audio
track — independent of video length and resolution.
Combine with low_memory=True so each pipeline stage's model (Whisper, Demucs,
MarianMT, Chatterbox) is unloaded between stages:
from videopython.ai.dubbing import VideoDubber
dubber = VideoDubber(low_memory=True)
result = dubber.dub_file(
input_path="2_hour_movie.mp4",
output_path="dubbed.mp4",
target_lang="es",
voice_clone=True,
preserve_background=True,
)
# Peak memory: model weights + audio track (~hundreds of MB for a 2-hour stereo
# source), no matter the video resolution. The video stream is copied
# unchanged into the output.
print(f"Translated {result.num_segments} segments")
See AI Dubbing for more on the
low_memory flag and dub_file().
Method Comparison
| Approach | Memory | Speed | Use Case |
|---|---|---|---|
VideoEdit.run_to_file() |
O(1) | Fast | Editing long videos with effects/transforms |
Video.from_path() |
O(all frames) | Fast access | Short videos, need random access |
FrameIterator |
O(1) | Sequential | Long videos, single pass analysis |
SceneDetector.detect_streaming() |
O(1) | Slower | Memory-constrained |
SceneDetector.detect_parallel() |
O(workers) | Fastest | Multi-core systems |
VideoDubber.dub_file() |
O(audio + model weights) | Same as dub_and_replace |
Dubbing long/high-res videos without loading frames |
Tips
- Check metadata first: Use
VideoMetadata.from_path()to check video size before deciding how to process it. - Sample wisely: For AI analysis, 0.1-0.5 FPS is usually sufficient. Higher rates waste compute.
- Use parallel detection:
detect_parallel()is 3-4x faster than streaming on multi-core machines. - Process segments: If you only need part of a video, use
start_second/end_secondparameters.