Skip to content

Processing Large Videos

Process videos that are too large to fit in memory using streaming APIs.

Goal

Analyze or process long videos (hours of footage) without running out of RAM.

The Problem

Loading a full video into memory can be expensive:

# This loads ALL frames into RAM - problematic for long videos
video = Video.from_path("2_hour_movie.mp4")  # Could use 50GB+ RAM

A 2-hour video at 1080p30 has ~216,000 frames. At ~6MB per frame (uncompressed RGB), that's over 1TB of data.

Solution: Streaming Editing Pipeline

For editing workflows, VideoEdit.run_to_file() streams frames one at a time from ffmpeg decode through per-frame effect processing to ffmpeg encode. Memory usage is constant (~250 MB) regardless of video length.

from videopython.editing import VideoEdit

plan = {
    "segments": [{
        "source": "2_hour_movie.mp4",
        "start": 0,
        "end": 7200,
        "operations": [
            {"op": "resize", "width": 1920, "height": 1080},
            {"op": "color_adjust", "saturation": 0, "contrast": 1.15},
            {"op": "fade", "mode": "in_out", "duration": 1.0},
            {"op": "volume_adjust", "volume": 1.5},
        ],
    }],
}

edit = VideoEdit.from_dict(plan)
edit.run_to_file("output.mp4", crf=20, preset="medium")
# Peak memory: ~250 MB regardless of video length

Streaming is the only execution engine: every op compiles to an ffmpeg filter or a per-frame effect, and frames are never accumulated in memory. A plan shape with no streaming strategy (e.g. a frame effect ordered after burned-in subtitles) is rejected with structured STREAMING_UNSUPPORTED errors before any decode. edit.streamability() returns the per-op classification without touching the disk; edit.check(meta) reports the same errors alongside the validity checks.

Context-requiring effects stream too: pass context= to run_to_file and e.g. add_subtitles burns word-level subtitles on the streaming path, with the transcription re-based onto each segment's local timeline:

edit.run_to_file("output.mp4", context={"transcription": transcription})

See the Operations table for the list of streamable ops. Inspect Operation.get(op_id).streamable programmatically.

Solution: FrameIterator

FrameIterator streams frames one at a time with O(1) memory usage:

from videopython.base import FrameIterator

# Process frames without loading entire video
with FrameIterator("long_video.mp4") as frames:
    for frame_idx, frame in frames:
        # frame is a numpy array (H, W, 3) in RGB
        # Only one frame in memory at a time
        process_frame(frame)

Full Example: Extract Thumbnails

Extract one thumbnail per minute from a long video:

from videopython.base import FrameIterator, VideoMetadata
from PIL import Image
import os

def extract_thumbnails(video_path: str, output_dir: str, interval_seconds: float = 60.0):
    """Extract thumbnails at regular intervals from a video."""
    os.makedirs(output_dir, exist_ok=True)

    # Get metadata without loading video
    metadata = VideoMetadata.from_path(video_path)
    fps = metadata.fps
    interval_frames = int(interval_seconds * fps)

    with FrameIterator(video_path) as frames:
        for frame_idx, frame in frames:
            if frame_idx % interval_frames == 0:
                # Save thumbnail
                timestamp = frame_idx / fps
                img = Image.fromarray(frame)
                img.thumbnail((320, 180))  # Resize to thumbnail
                img.save(f"{output_dir}/thumb_{timestamp:.0f}s.jpg")
                print(f"Saved thumbnail at {timestamp:.0f}s")

# Extract one thumbnail per minute
extract_thumbnails("2_hour_movie.mp4", "thumbnails/", interval_seconds=60)

AI Video Analysis (Scene-First)

VideoAnalyzer.analyze_path() returns scene-centered outputs. For long videos, use sampling="low" to keep wall time down (8-frame budget per scene, 20-second adjacent-merge threshold):

from videopython.ai import VideoAnalyzer, VideoAnalysisConfig

config = VideoAnalysisConfig(
    enabled_analyzers={"audio_to_text", "semantic_scene_detector", "scene_vlm"},
)

analysis = VideoAnalyzer(config=config, sampling="low").analyze_path("long_video.mp4")
for scene in (analysis.scenes.samples if analysis.scenes else []):
    print(scene.scene_index, scene.start_second, scene.end_second)
    if scene.scene_description:
        print("   caption:", scene.scene_description.caption)
        print("  subjects:", scene.scene_description.subjects)
        print(" shot_type:", scene.scene_description.shot_type)

Processing a Segment

Process only a portion of a large video:

from videopython.base import FrameIterator

# Only iterate frames from 1:00:00 to 1:10:00
with FrameIterator("movie.mp4", start_second=3600, end_second=4200) as frames:
    for frame_idx, frame in frames:
        process_frame(frame)

Dubbing Large Videos

VideoDubber.dub_and_replace() loads every frame into RAM via Video.from_path(), which is impractical for long sources. dub_file() operates on paths instead: it extracts audio via ffmpeg, runs the dubbing pipeline on the audio only, and muxes the dubbed audio back into the source video using ffmpeg stream-copy (no video re-encode). Peak memory is bounded by model weights and the audio track — independent of video length and resolution.

Combine with low_memory=True so each pipeline stage's model (Whisper, Demucs, the translation backend, Chatterbox) is unloaded between stages:

from videopython.ai.dubbing import VideoDubber

dubber = VideoDubber(low_memory=True)
result = dubber.dub_file(
    input_path="2_hour_movie.mp4",
    output_path="dubbed.mp4",
    target_lang="es",
    voice_clone=True,
    preserve_background=True,
)
# Peak memory: model weights + audio track (~hundreds of MB for a 2-hour stereo
# source), no matter the video resolution. The video stream is copied
# unchanged into the output.

print(f"Translated {result.num_segments} segments")

See AI Dubbing for more on the low_memory flag and dub_file().

Method Comparison

Approach Memory Speed Use Case
VideoEdit.run_to_file() O(1) Fast Editing long videos with effects/transforms
Video.from_path() O(all frames) Fast access Short videos, need random access
FrameIterator O(1) Sequential Long videos, single pass analysis
VideoDubber.dub_file() O(audio + model weights) Same as dub_and_replace Dubbing long/high-res videos without loading frames

Tips

  • Check metadata first: Use VideoMetadata.from_path() to check video size before deciding how to process it.
  • Sample wisely: For AI analysis, 0.1-0.5 FPS is usually sufficient. Higher rates waste compute.
  • Process segments: If you only need part of a video, use start_second/end_second parameters.