Automatic Editing
AutoEditor turns source videos plus a one-line brief into a finished cut, with
a local vision-language model (via Ollama) doing the
editorial selection. No cloud API keys, no timestamps to hand-author.
from videopython.ai import AutoEditor, OllamaVisionLLM
# The planner is a local Ollama vision model. Pull it first: `ollama pull qwen3.6:27b`.
editor = AutoEditor(planner=OllamaVisionLLM(model="qwen3.6:27b"))
edit = editor.edit(
["clip_a.mp4", "clip_b.mp4", "clip_c.mp4"],
brief="A punchy 15-second teaser; lead with the most dynamic shot.",
)
edit.run_to_file("teaser.mp4")
editor.edit(...) returns a validated VideoEdit — the same
plan object you can author by hand — so you can inspect, tweak, or re-render it.
How it works
The model is bad at one thing (precise timestamps) and good at another (judging shots). The design plays to that split:
- Analyze — each source is run through
VideoAnalyzer: scene boundaries, a per-scene caption (SceneVLM), and a transcript. - Catalog — the scenes are projected into a flat catalog of candidate clips,
each with a stable
scene_id, its exact source/start/end, caption, transcript, and a keyframe image. - Plan — the planner sees the catalog (text + keyframes) and the brief, then
authors an
EditPlanthat references scenes by id and adds operations (resize, crossfades, ...). It never invents timestamps. - Resolve + run — each
scene_idmaps back to its exact bounds, producing a realVideoEdit. The plan is repaired, dimension-normalized, and validated before rendering.
Because the model selects by id, its temporal imprecision never reaches the render — the precise bounds come from scene detection, the editorial judgment from the model.
Refine loop
AutoEditor runs an internal repair/validate loop (max_rounds, default 3). If a
plan references an unknown id or violates a bound, the structured error is fed
back to the planner for another attempt. If no valid edit emerges within the
budget it raises AutoEditError.
editor = AutoEditor(
planner=OllamaVisionLLM(model="qwen3.6:27b"),
max_rounds=3,
normalize_target="largest", # unify segment dimensions to the largest source
)
Reusing analysis
Analysis is the expensive step. If you already have VideoAnalysis objects (or
want to analyze once and try several briefs), skip straight to planning:
from videopython.ai import VideoAnalyzer
analyses = [VideoAnalyzer().analyze_path(p) for p in ["a.mp4", "b.mp4"]]
edit = editor.edit_from_analyses(analyses, brief="A calm, scenic 20s intro.")
Choosing a planner model
The planner must be vision-capable (it is sent keyframes) and support
Ollama's structured-output format (schema-conditioned decoding). qwen3.6:27b
is the default (Apache-2.0). Some builds — e.g. certain MLX vision models — accept
images but ignore format, and planning fails; if a model returns prose instead
of JSON, pick another tag. OllamaVisionLLM is model-agnostic via model=, and
StructuredVisionLLM is the seam if you want to back the planner with something
other than Ollama.
When to use which LLM mode
| You want… | Use |
|---|---|
| videopython to edit for you, fully local | AutoEditor (this page) |
| your own frontier model / harness to drive the tools | MCP server |
| to author/validate plans from your own LLM integration | LLM Integration |