AI Effects
AI-powered, shape-preserving effects. Unlike plain effects,
these run a model per frame, so they physically live in videopython.ai and the
core editing layer keeps no AI dependency.
ObjectDetectionOverlay
Detects objects in every frame with a YOLOv8-COCO model and composites tidy,
colour-coded bounding boxes with class labels (and optional confidence). The
detector (ObjectDetector) is constructed
internally; the box/label drawing is done by the AI-free renderer
videopython.base.draw_detections.
from videopython.ai import ObjectDetectionOverlay
from videopython.editing import VideoEdit, SegmentConfig
# Default: per-class colours, confidence shown, detect every 2nd frame.
edit = VideoEdit(segments=[SegmentConfig(source="street.mp4", start=0, end=5, operations=[
ObjectDetectionOverlay(),
])])
edit.run_to_file("annotated.mp4")
# Only people and cars, detect every frame, larger model for accuracy.
edit = VideoEdit(segments=[SegmentConfig(source="street.mp4", start=0, end=5, operations=[
ObjectDetectionOverlay(
class_filter=["person", "car"],
detection_interval=1,
model_size="s",
),
])])
edit.run_to_file("annotated.mp4")
In a JSON editing plan (it is exposed in the LLM-facing schema):
{
"op": "object_detection_overlay",
"class_filter": ["person", "car", "dog"],
"confidence_threshold": 0.4,
"detection_interval": 2,
"window": {"start": 0, "stop": 5}
}
Performance
object_detection_overlay is streamable — memory stays bounded on long
clips — but detection is compute-bound: a YOLO forward pass runs per
sampled frame. To cap cost:
window— restrict the overlay (and therefore detection) to a time range.detection_interval— run detection every Nth frame and hold the boxes in between (default2). Higher is faster; fast-moving objects show more lag.class_filter— fewer classes to draw.model_size—"n"(nano, default, fastest) →"s"→"m"(most accurate).
ObjectDetectionOverlay
Bases: Effect
Detect objects per frame and overlay labelled bounding boxes.
Runs a YOLOv8-COCO detector and composites tidy, colour-coded boxes with class labels (and optional confidence) onto every frame in the window.
Detection runs on a detection_interval cadence in the streaming path and
boxes are held between detections, so the cost is compute-bound, not
memory-bound: "streamable" here means bounded memory, not bounded
compute. On long clips, cap cost with window (limit the time range),
a larger detection_interval, a class_filter, and/or the smaller
model_size. Only streaming_init and process_frame are
overridden; the streaming engine drives that contract for bounded-memory
execution.
Source code in src/videopython/ai/effects.py
Renderer
The drawing is a pure, AI-free function reusable with any list of
DetectedObject. Colours are deterministic
per class, so a class is the same colour in every frame and across runs.
from videopython.base import DetectionStyle, class_color, draw_detections
frame = draw_detections(frame, detections, DetectionStyle(show_confidence=False))
draw_detections
draw_detections(
frame: ndarray,
detections: list[DetectedObject],
style: DetectionStyle = DetectionStyle(),
) -> np.ndarray
Return a copy of frame with detections drawn as labelled boxes.
Shape-preserving: the result is the same (H, W, 3) uint8 array. An
empty detections list (or one filtered out by min_confidence) is a
no-op that returns frame unchanged. Boxes are clamped to the frame, so
off-frame coordinates clip cleanly instead of raising. Label chips flip
inside the box when they would overflow the top edge and clamp horizontally
so they never leave the frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame
|
ndarray
|
Source frame as |
required |
detections
|
list[DetectedObject]
|
Objects to draw; each uses its normalized |
required |
style
|
DetectionStyle
|
Visual styling (colours, stroke width, label options). |
DetectionStyle()
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new |
Source code in src/videopython/base/draw_detections.py
DetectionStyle
dataclass
Styling for :func:draw_detections.
Lengths expressed as a fraction of the frame's longer side are resolution-independent: the same style reads consistently at 1080p and 4k.
Source code in src/videopython/base/draw_detections.py
box_color
class-attribute
instance-attribute
Fixed (R, G, B) for every box, or None for per-class colours.
line_thickness
class-attribute
instance-attribute
Box stroke width as a fraction of max(height, width) (~3px at 1080p).
show_confidence
class-attribute
instance-attribute
Append the confidence as a whole-number percent to each label.
label_font_size
class-attribute
instance-attribute
Label text height as a fraction of max(height, width) (~24px at 1080p).
label_text_color
class-attribute
instance-attribute
Colour of the label text drawn on the chip.
label_bg_alpha
class-attribute
instance-attribute
Opacity (0-255) of the label chip background.
min_confidence
class-attribute
instance-attribute
Detections below this confidence are skipped.
class_color
Deterministic RGB colour for a class label.
Common COCO classes get a reserved Material hue; everything else maps
md5(label) -> HSV hue at fixed saturation/value. md5 (not the
salted built-in hash) is used so colours are stable across processes
and test runs.