LLM & AI Agent Integration
videopython is designed to be controlled by LLMs. Every operation is a
Pydantic BaseModel whose fields ARE the JSON wire format, so structural
rules, parameter types, and value constraints surface as standard JSON
Schema. An LLM can generate, validate, and execute editing plans without
needing to learn the surface from examples.
Workflow
- Generate — pass
VideoEdit.json_schema()to the LLM as a tool / structured-output schema. - Validate — call
edit.validate()for a dry-run via metadata. No frames load. - Execute —
edit.run()returns aVideo;edit.run_to_file()streams directly to disk.
from videopython.editing import VideoEdit
schema = VideoEdit.json_schema()
plan = call_your_llm(schema=schema,
prompt="Create a 15s highlight reel from input.mp4")
edit = VideoEdit.from_dict(plan)
predicted = edit.validate() # catches bad plans before any I/O
print(predicted)
video = edit.run()
video.save("output.mp4")
Passing the Schema
VideoEdit.json_schema() returns a JSON Schema (Draft-07 compatible)
covering segments, post-operations, the matching flags, and a
discriminated union over every registered Operation. AI ops appear in
the union only after import videopython.ai.
Anthropic tool use
import anthropic
from videopython.editing import VideoEdit
client = anthropic.Anthropic()
schema = VideoEdit.json_schema()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "create_video_edit",
"description": "Create a video editing plan",
"input_schema": schema,
}],
messages=[{"role": "user", "content":
"Cut input.mp4 to the first 10 seconds, resize to 1080x1920, fade in."}],
)
tool_block = next(b for b in response.content if b.type == "tool_use")
edit = VideoEdit.from_dict(tool_block.input)
edit.validate()
edit.run().save("output.mp4")
OpenAI function calling
import json
from openai import OpenAI
from videopython.editing import VideoEdit
client = OpenAI()
schema = VideoEdit.json_schema()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a video editor."},
{"role": "user", "content":
"Cut input.mp4 to the first 10 seconds, resize to 1080x1920, fade in."},
],
tools=[{
"type": "function",
"function": {
"name": "create_video_edit",
"description": "Create a video editing plan",
"parameters": schema,
},
}],
)
plan = json.loads(response.choices[0].message.tool_calls[0].function.arguments)
edit = VideoEdit.from_dict(plan)
edit.validate()
edit.run().save("output.mp4")
Discovering Operations
from videopython.editing import Operation, OpCategory
# All registered ops
for op_id, cls in Operation.registry().items():
doc = (cls.__doc__ or "").splitlines()[0].strip()
print(f"{op_id}: {doc}")
# By category
transforms = {k: v for k, v in Operation.registry().items()
if v.category is OpCategory.TRANSFORM}
# Per-op JSON Schema (standard Pydantic)
Operation.get("color_adjust").model_json_schema()
For per-op tool definitions:
tools = []
for op_id, cls in Operation.registry().items():
if cls.category is not OpCategory.TRANSFORM:
continue
tools.append({
"name": f"transform_{op_id}",
"description": (cls.__doc__ or "").splitlines()[0],
"input_schema": cls.model_json_schema(),
})
Validation Before Execution
VideoEdit.validate() chains each op's predict_metadata across the
plan and checks segment bounds, effect windows, and concatenation
compatibility. Catches:
- Invalid time ranges (
start >= end,end > source duration) - Effect
windowoutside the predicted segment duration - Incompatible segment dimensions/fps for concatenation
- Unknown operation IDs (
Pydantic ValidationErrorraised byfrom_dict) - Out-of-range parameter values (also at
from_dicttime)
edit = VideoEdit.from_dict(plan)
try:
predicted = edit.validate()
print(f"Output: {predicted.width}x{predicted.height}, "
f"{predicted.total_seconds:.1f}s")
except ValueError as e:
# Feed `e` back to the LLM to retry
print(f"Invalid plan: {e}")
This makes it cheap to let an LLM retry: validate, return the error, ask the LLM to fix it.
Context Data
Operations that need side-channel data declare it via
requires: ClassVar[tuple[str, ...]]. The runner pulls matching keys
out of the context dict and threads them into the op:
# silence_removal and add_subtitles both need a transcription
edit = VideoEdit.from_dict(plan)
video = edit.run(context={"transcription": transcription})
Discover requires-aware ops via the registry:
needs_transcript = [op_id for op_id, cls in Operation.registry().items()
if "transcription" in cls.requires]
AI Operations
AI-powered ops (face_crop, ...) are registered only when
videopython.ai is imported. If your plans use them, import AI first
so the schema includes them:
import videopython.ai # registers AI ops
from videopython.editing import VideoEdit
schema = VideoEdit.json_schema() # now includes face_crop
Tips
- Start with the schema. Pass
VideoEdit.json_schema()as the tool schema — it encodes all structural rules so the LLM doesn't need examples. - Always validate. Call
edit.validate()beforeedit.run(). Validation is fast and catches most errors. - Use the error loop. If validation fails, feed the error back to the LLM and ask it to fix the plan. Most issues correct in one retry.
- Provide source metadata. Tell the LLM the source duration, dimensions, and fps so it can generate sensible time ranges and resize targets.
- Expose the registry. For agents, let the LLM call into
Operation.registry()instead of hardcoding the op list.