AI Understanding
Analyze videos, transcribe audio, and describe visual content.
For a single aggregate, serializable analysis object across multiple analyzers, see Video Analysis.
Local Model Support
| Class | Local Model Family |
|---|---|
| ImageToText | BLIP |
| AudioToText | Whisper |
| AudioClassifier | AST |
| ObjectDetector | YOLO |
| TextDetector | EasyOCR |
| FaceDetector | OpenCV / YOLOv8-face |
| CameraMotionDetector | OpenCV |
| MotionAnalyzer | OpenCV |
| ActionRecognizer | VideoMAE |
| SemanticSceneDetector | TransNetV2 |
AudioToText
AudioToText
Transcription service for audio and video using local Whisper models.
Source code in src/videopython/ai/understanding/audio.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
transcribe
Transcribe audio or video to text.
Source code in src/videopython/ai/understanding/audio.py
AudioClassifier
Detect and classify sounds, music, and audio events with timestamps using Audio Spectrogram Transformer (AST), a state-of-the-art model achieving 0.485 mAP on AudioSet.
Basic Usage
from videopython.ai import AudioClassifier
from videopython.base import Video
classifier = AudioClassifier(confidence_threshold=0.3)
video = Video.from_path("video.mp4")
result = classifier.classify(video)
# Clip-level predictions (overall audio content)
for label, confidence in result.clip_predictions.items():
print(f"{label}: {confidence:.2f}")
# Timestamped events
for event in result.events:
print(f"{event.start:.1f}s - {event.end:.1f}s: {event.label} ({event.confidence:.2f})")
AudioClassifier
Audio event and sound classification using AST.
Source code in src/videopython/ai/understanding/audio.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
classify
Classify audio events in audio or video.
Source code in src/videopython/ai/understanding/audio.py
ImageToText
ImageToText
Generates text descriptions of images using BLIP.
Source code in src/videopython/ai/understanding/image.py
describe_image
Generate a text description of an image.
Source code in src/videopython/ai/understanding/image.py
Detection Classes
ObjectDetector
ObjectDetector
Detects objects in images using local YOLO models.
Source code in src/videopython/ai/understanding/detection.py
detect
Detect objects in an image.
Source code in src/videopython/ai/understanding/detection.py
FaceDetector
FaceDetector
Detects faces in images using OpenCV (CPU) or YOLOv8-face (GPU).
Source code in src/videopython/ai/understanding/detection.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
detect
Detect faces in an image.
Source code in src/videopython/ai/understanding/detection.py
detect_batch
Detect faces in a batch of images.
Source code in src/videopython/ai/understanding/detection.py
TextDetector
TextDetector supports two output modes:
detect(image)->list[str](backward-compatible plain text)detect_detailed(image)->list[DetectedText](text + confidence + bounding box)
from videopython.ai import TextDetector
detector = TextDetector(languages=["en"])
texts = detector.detect(frame)
regions = detector.detect_detailed(frame)
for region in regions:
print(region.text, region.confidence, region.bounding_box)
TextDetector
Detects text in images using local EasyOCR.
Source code in src/videopython/ai/understanding/detection.py
detect
Detect text in an image.
Returns plain text strings for backward compatibility.
detect_detailed
Detect text in an image with confidence and region boxes.
Source code in src/videopython/ai/understanding/detection.py
CameraMotionDetector
CameraMotionDetector
Detects camera motion between frames using optical flow.
Source code in src/videopython/ai/understanding/detection.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 | |
detect
Detect camera motion between two consecutive frames.
Source code in src/videopython/ai/understanding/detection.py
MotionAnalyzer
Analyze motion in video frames using optical flow. Detects camera motion types (pan, tilt, zoom) and measures motion magnitude.
from videopython.ai import MotionAnalyzer
from videopython.base import Video
analyzer = MotionAnalyzer()
video = Video.from_path("video.mp4")
# Analyze motion between two frames
motion = analyzer.analyze_frames(video.frames[0], video.frames[1])
print(f"Motion type: {motion.motion_type}, magnitude: {motion.magnitude:.2f}")
# Analyze entire video (memory-efficient)
results = analyzer.analyze_video_path("video.mp4", frames_per_second=1.0)
for timestamp, motion in results:
print(f"{timestamp:.1f}s: {motion.motion_type} ({motion.magnitude:.2f})")
MotionAnalyzer
Analyzes motion characteristics in video using optical flow.
Detects both camera motion (pan, tilt, zoom) and overall motion magnitude, which is useful for identifying dynamic vs static scenes.
Example
from videopython.ai import MotionAnalyzer from videopython.base import Video
analyzer = MotionAnalyzer() video = Video.from_path("video.mp4")
Analyze motion between two frames
motion = analyzer.analyze_frames(video.frames[0], video.frames[1]) print(f"Motion type: {motion.motion_type}, magnitude: {motion.magnitude:.2f}")
Analyze motion for a list of frames (returns list of MotionInfo)
motions = analyzer.analyze_frame_sequence(video.frames[:10])
Source code in src/videopython/ai/understanding/motion.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | |
__init__
__init__(
motion_threshold: float = 2.0,
zoom_threshold: float = 0.1,
magnitude_cap: float = 50.0,
)
Initialize motion analyzer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
motion_threshold
|
float
|
Minimum average flow magnitude to consider as motion. Values below this are classified as "static". Default: 2.0 pixels/frame. |
2.0
|
zoom_threshold
|
float
|
Threshold for detecting zoom based on flow pattern. Default: 0.1 (10% difference between center and edges). |
0.1
|
magnitude_cap
|
float
|
Cap for normalizing magnitude to 0-1 range. Motion above this value maps to 1.0. Default: 50.0 pixels/frame. |
50.0
|
Source code in src/videopython/ai/understanding/motion.py
analyze_frames
Analyze motion between two consecutive frames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame1
|
ndarray
|
First frame as numpy array (H, W, 3) RGB. |
required |
frame2
|
ndarray
|
Second frame as numpy array (H, W, 3) RGB. |
required |
Returns:
| Type | Description |
|---|---|
MotionInfo
|
MotionInfo with motion type and magnitude. |
Source code in src/videopython/ai/understanding/motion.py
analyze_frame_sequence
Analyze motion for a sequence of frames.
Returns motion info for each pair of consecutive frames. Result list has length len(frames) - 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
list[ndarray]
|
List of frames as numpy arrays. |
required |
Returns:
| Type | Description |
|---|---|
list[MotionInfo]
|
List of MotionInfo objects for each frame transition. |
Source code in src/videopython/ai/understanding/motion.py
analyze_video
Analyze motion throughout a video.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video
|
Video
|
Video object to analyze. |
required |
sample_interval
|
int
|
Analyze every Nth frame pair. Default: 1 (all frames). |
1
|
Returns:
| Type | Description |
|---|---|
list[MotionInfo]
|
List of MotionInfo objects for sampled frame transitions. |
Source code in src/videopython/ai/understanding/motion.py
analyze_video_path
analyze_video_path(
path: str | Path, frames_per_second: float = 1.0
) -> list[tuple[float, MotionInfo]]
Analyze motion from video file with minimal memory usage.
Streams frames from the video file instead of loading entire video. Returns timestamped motion info.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to video file. |
required |
frames_per_second
|
float
|
How many frames per second to analyze. Default: 1.0. |
1.0
|
Returns:
| Type | Description |
|---|---|
list[tuple[float, MotionInfo]]
|
List of (timestamp, MotionInfo) tuples. |
Source code in src/videopython/ai/understanding/motion.py
aggregate_motion
staticmethod
Aggregate motion info into scene-level statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
motions
|
list[MotionInfo]
|
List of MotionInfo objects from frames in a scene. |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, str]
|
Tuple of (average_magnitude, dominant_motion_type). |
Source code in src/videopython/ai/understanding/motion.py
ActionRecognizer
Recognize actions and activities in video clips using VideoMAE, a masked autoencoder fine-tuned on Kinetics-400 (400 action classes like "walking", "running", "dancing", "answering questions").
from videopython.ai import ActionRecognizer
recognizer = ActionRecognizer(model_size="base", confidence_threshold=0.1)
# Recognize actions in entire video
actions = recognizer.recognize_path("video.mp4", top_k=5)
for action in actions:
print(f"{action.label}: {action.confidence:.1%}")
# Output: answering questions: 37.2%
# using computer: 12.2%
ActionRecognizer
Recognizes actions/activities in video clips using VideoMAE.
VideoMAE is a masked autoencoder pre-trained on video data and fine-tuned for action recognition on Kinetics-400 (400 action classes).
Example
from videopython.base import Video from videopython.ai.understanding import ActionRecognizer video = Video.from_path("video.mp4") recognizer = ActionRecognizer() actions = recognizer.recognize(video) for action in actions: ... print(f"{action.label}: {action.confidence:.2f}")
Source code in src/videopython/ai/understanding/temporal.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
__init__
__init__(
model_size: MODEL_VARIANTS = "base",
device: str | None = None,
confidence_threshold: float = 0.1,
num_frames: int = 16,
)
Initialize the action recognizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_size
|
MODEL_VARIANTS
|
Model size - "base" (faster) or "large" (more accurate). |
'base'
|
device
|
str | None
|
Device to run on ('cuda', 'cpu', or None for auto). |
None
|
confidence_threshold
|
float
|
Minimum confidence for reported actions. |
0.1
|
num_frames
|
int
|
Number of frames to sample per clip (default 16 for VideoMAE). |
16
|
Source code in src/videopython/ai/understanding/temporal.py
recognize
Recognize actions in a video.
Processes the entire video as a single clip and returns top-k predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video
|
Video
|
Video object to analyze. |
required |
top_k
|
int
|
Number of top predictions to return. |
5
|
Returns:
| Type | Description |
|---|---|
list[DetectedAction]
|
List of DetectedAction objects with recognized activities. |
Source code in src/videopython/ai/understanding/temporal.py
recognize_path
recognize_path(
path: str | Path,
top_k: int = 5,
start_second: float | None = None,
end_second: float | None = None,
) -> list[DetectedAction]
Recognize actions from a video file with memory-efficient loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to video file. |
required |
top_k
|
int
|
Number of top predictions to return. |
5
|
start_second
|
float | None
|
Optional start time for analysis. |
None
|
end_second
|
float | None
|
Optional end time for analysis. |
None
|
Returns:
| Type | Description |
|---|---|
list[DetectedAction]
|
List of DetectedAction objects with recognized activities. |
Source code in src/videopython/ai/understanding/temporal.py
SemanticSceneDetector
ML-based scene boundary detection using TransNetV2. More accurate than histogram-based detection, especially for gradual transitions like fades and dissolves.
from videopython.ai import SemanticSceneDetector
detector = SemanticSceneDetector(threshold=0.5, min_scene_length=1.0)
scenes = detector.detect_streaming("video.mp4")
for scene in scenes:
print(f"Scene: {scene.start:.1f}s - {scene.end:.1f}s ({scene.duration:.1f}s)")
SemanticSceneDetector
ML-based scene detection using TransNetV2.
TransNetV2 is a neural network specifically designed for shot boundary detection, providing more accurate scene boundaries than histogram-based methods, especially for gradual transitions.
Uses the transnetv2-pytorch package with pretrained weights.
Example
from videopython.ai.understanding import SemanticSceneDetector detector = SemanticSceneDetector() scenes = detector.detect_streaming("video.mp4") for scene in scenes: ... print(f"Scene: {scene.start:.2f}s - {scene.end:.2f}s")
Source code in src/videopython/ai/understanding/temporal.py
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 | |
__init__
Initialize the semantic scene detector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Confidence threshold for scene boundaries (0.0-1.0). Higher values = fewer, more confident boundaries. |
0.5
|
min_scene_length
|
float
|
Minimum scene duration in seconds. |
0.5
|
device
|
str | None
|
Device to run on ('cuda', 'mps', 'cpu', or None for auto). Note: MPS may have numerical inconsistencies; use 'cpu' for reproducible results. |
None
|
Source code in src/videopython/ai/understanding/temporal.py
detect
Detect scenes in a video using ML-based boundary detection.
Note: This method requires saving video to a temporary file for TransNetV2 processing. For better performance, use detect_streaming() with a file path directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video
|
Video
|
Video object to analyze. |
required |
Returns:
| Type | Description |
|---|---|
list[SceneBoundary]
|
List of SceneBoundary objects representing detected scenes. |
Source code in src/videopython/ai/understanding/temporal.py
detect_streaming
detect_streaming(
path: str | Path,
start_second: float | None = None,
end_second: float | None = None,
) -> list[SceneBoundary]
Detect scenes from a video file.
Uses TransNetV2 with pretrained weights for accurate shot boundary detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to video file. |
required |
start_second
|
float | None
|
Optional start time for analysis (not yet supported). |
None
|
end_second
|
float | None
|
Optional end time for analysis (not yet supported). |
None
|
Returns:
| Type | Description |
|---|---|
list[SceneBoundary]
|
List of SceneBoundary objects representing detected scenes. |
Source code in src/videopython/ai/understanding/temporal.py
detect_from_path
classmethod
detect_from_path(
path: str | Path,
threshold: float = 0.5,
min_scene_length: float = 0.5,
) -> list[SceneBoundary]
Convenience method for one-shot scene detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to video file. |
required |
threshold
|
float
|
Scene boundary threshold (0.0-1.0). |
0.5
|
min_scene_length
|
float
|
Minimum scene duration in seconds. |
0.5
|
Returns:
| Type | Description |
|---|---|
list[SceneBoundary]
|
List of SceneBoundary objects representing detected scenes. |
Source code in src/videopython/ai/understanding/temporal.py
Scene Data Classes
These classes are used by SceneDetector to represent analysis results:
SceneBoundary
SceneBoundary
dataclass
Timing information for a detected scene.
A lightweight structure representing scene boundaries detected by SceneDetector. This is a backbone type - higher-level scene analysis belongs in orchestration packages.
Attributes:
| Name | Type | Description |
|---|---|---|
start |
float
|
Scene start time in seconds |
end |
float
|
Scene end time in seconds |
start_frame |
int
|
Index of the first frame in this scene |
end_frame |
int
|
Index of the last frame in this scene (exclusive) |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create SceneBoundary from dictionary.
Source code in src/videopython/base/description.py
BoundingBox
BoundingBox
dataclass
A bounding box for detected objects in an image.
Coordinates are normalized to [0, 1] range relative to image dimensions.
Attributes:
| Name | Type | Description |
|---|---|---|
x |
float
|
Left edge of the box (0 = left edge of image) |
y |
float
|
Top edge of the box (0 = top edge of image) |
width |
float
|
Width of the box |
height |
float
|
Height of the box |
Source code in src/videopython/base/description.py
to_dict
from_dict
classmethod
DetectedObject
DetectedObject
dataclass
An object detected in a video frame.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
str
|
Name/class of the detected object (e.g., "person", "car", "dog") |
confidence |
float
|
Detection confidence score between 0 and 1 |
bounding_box |
BoundingBox | None
|
Optional bounding box location of the object |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create DetectedObject from dictionary.
Source code in src/videopython/base/description.py
DetectedText
DetectedText
dataclass
Text detected in a video frame.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
OCR text content |
confidence |
float
|
Detection confidence score between 0 and 1 |
bounding_box |
BoundingBox | None
|
Optional normalized bounding box for the text region |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create DetectedText from dictionary.
Source code in src/videopython/base/description.py
AudioEvent
AudioEvent
dataclass
A detected audio event with timestamp.
Attributes:
| Name | Type | Description |
|---|---|---|
start |
float
|
Start time in seconds |
end |
float
|
End time in seconds |
label |
str
|
Name of the detected sound (e.g., "Music", "Speech", "Dog bark") |
confidence |
float
|
Detection confidence score between 0 and 1 |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create AudioEvent from dictionary.
AudioClassification
AudioClassification
dataclass
Complete audio classification results.
Attributes:
| Name | Type | Description |
|---|---|---|
events |
list[AudioEvent]
|
List of detected audio events with timestamps |
clip_predictions |
dict[str, float]
|
Overall class probabilities for the entire audio clip |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create AudioClassification from dictionary.
Source code in src/videopython/base/description.py
MotionInfo
MotionInfo
dataclass
Motion characteristics between consecutive frames.
Attributes:
| Name | Type | Description |
|---|---|---|
motion_type |
str
|
Classification of camera/scene motion - "static": No significant motion - "pan": Horizontal camera movement - "tilt": Vertical camera movement - "zoom": Camera zoom in/out - "complex": Mixed or irregular motion |
magnitude |
float
|
Normalized motion magnitude (0.0 = no motion, 1.0 = high motion) |
raw_magnitude |
float
|
Raw optical flow magnitude (pixels/frame) |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
from_dict
classmethod
Create MotionInfo from dictionary.
DetectedAction
DetectedAction
dataclass
An action/activity detected in a video segment.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
str
|
Name of the detected action (e.g., "walking", "running", "dancing") |
confidence |
float
|
Detection confidence score between 0 and 1 |
start_frame |
int | None
|
Start frame index of the action |
end_frame |
int | None
|
End frame index of the action (exclusive) |
start_time |
float | None
|
Start time in seconds |
end_time |
float | None
|
End time in seconds |
Source code in src/videopython/base/description.py
to_dict
Convert to dictionary for JSON serialization.
Source code in src/videopython/base/description.py
from_dict
classmethod
Create DetectedAction from dictionary.