Backends

scenedetect.backends Module

This module contains VideoStream implementations backed by various Python multimedia libraries. In addition to creating backend objects directly, scenedetect.open_video() can be used to open a video with a specified backend, falling back to OpenCV if not available.

All backends available on the current system can be found via AVAILABLE_BACKENDS.

If you already have a cv2.VideoCapture object you want to use for scene detection, you can use a VideoCaptureAdapter instead of a backend. This is useful when working with devices or streams, for example.

Video Files

Assuming we have a file video.mp4 in our working directory, we can load it and perform scene detection on it using open_video():

from scenedetect import open_video
video = open_video('video.mp4')

An optional backend from AVAILABLE_BACKENDS can be passed to open_video() (e.g. backend=’opencv’). Additional keyword arguments passed to open_video() will be forwarded to the backend constructor. If the specified backend is unavailable, or loading the video fails, opencv will be tried as a fallback.

Lastly, to use a specific backend directly:

# Manually importing and constructing a backend:
from scenedetect.backends.opencv import VideoStreamCv2
video = VideoStreamCv2('video.mp4')

In both examples above, the resulting video can be used with SceneManager.detect_scenes().

Devices / Cameras / Pipes

You can use an existing cv2.VideoCapture object with the PySceneDetect API using a VideoCaptureAdapter. For example, to use a SceneManager with a webcam device:

from scenedetect import SceneManager, ContentDetector
from scenedetect.backends import VideoCaptureAdapter
# Open device ID 2.
cap = cv2.VideoCapture(2)
video = VideoCaptureAdapter(cap)
total_frames = 1000
scene_manager = SceneManager()
scene_manager.add_detector(ContentDetector())
scene_manager.detect_scenes(video=video, duration=total_frames)

When working with live inputs, note that you can pass a callback to detect_scenes() to be called on every scene detection event. See the SceneManager examples for details.

scenedetect.backends.AVAILABLE_BACKENDS: Dict[str, Type] = {'opencv': <class 'scenedetect.backends.opencv.VideoStreamCv2'>, 'pyav': <class 'scenedetect.backends.pyav.VideoStreamAv'>}

All available backends that scenedetect.open_video() can consider for the backend parameter. These backends must support construction with the following signature:

BackendType(path: str, framerate: Optional[float])

VideoStreamCv2 is backed by the OpenCV VideoCapture object. This is the default backend. Works with video files, image sequences, and network streams/URLs.

For wrapping input devices or pipes, there is also VideoCaptureAdapter which can be constructed from an existing cv2.VideoCapture. This allows performing scene detection on inputs which do not support seeking.

class scenedetect.backends.opencv.VideoCaptureAdapter(cap, framerate=None, max_read_attempts=5)

Adapter for existing VideoCapture objects. Unlike VideoStreamCv2, this class supports VideoCaptures which may not support seeking.

Create from an existing OpenCV VideoCapture object. Used for webcams, live streams, pipes, or other inputs which may not support seeking.

Parameters:
  • cap (VideoCapture) – The cv2.VideoCapture object to wrap. Must already be opened and ready to have cap.read() called on it.

  • framerate (float | None) – If set, overrides the detected framerate.

  • max_read_attempts (int) – Number of attempts to continue decoding the video after a frame fails to decode. This allows processing videos that have a few corrupted frames or metadata (in which case accuracy of detection algorithms may be lower). Once this limit is passed, decoding will stop and emit an error.

Raises:

ValueError – capture is not open, framerate or max_read_attempts is invalid

read(decode=True, advance=True)

Read and decode the next frame as a np.ndarray. Returns False when video ends, or the maximum number of decode attempts has passed.

Parameters:
  • decode (bool) – Decode and return the frame.

  • advance (bool) – Seek to the next frame. If False, will return the current (last) frame.

Returns:

If decode = True, the decoded frame (np.ndarray), or False (bool) if end of video. If decode = False, a bool indicating if advancing to the the next frame succeeded.

Return type:

ndarray | bool

reset()

Not supported.

seek(target)

The underlying VideoCapture is assumed to not support seeking.

Parameters:

target (FrameTimecode | float | int) –

BACKEND_NAME = 'opencv_adapter'

Unique name used to identify this backend.

property aspect_ratio: float

Display/pixel aspect ratio as a float (1.0 represents square pixels).

property capture: VideoCapture

Returns reference to underlying VideoCapture object. Use with caution.

Prefer to use this property only to take ownership of the underlying cv2.VideoCapture object backing this object. Using the read/grab methods through this property are unsupported and will leave this object in an inconsistent state.

property duration: FrameTimecode | None

Duration of the stream as a FrameTimecode, or None if non terminating.

property frame_number: int

Current position within stream in frames as an int.

1 indicates the first frame was just decoded by the last call to read with advance=True, whereas 0 indicates that no frames have been read.

This method will always return 0 if no frames have been read.

property frame_rate: float

Framerate in frames/sec.

property frame_size: Tuple[int, int]

Reported size of each video frame in pixels as a tuple of (width, height).

property is_seekable: bool

Always False, as the underlying VideoCapture is assumed to not support seeking.

property name: str

Always ‘CAP_ADAPTER’.

property path: str

Always ‘CAP_ADAPTER’.

property position: FrameTimecode

Current position within stream as FrameTimecode. Use the position_ms() if an accurate duration of elapsed time is required, as position is currently based off of the number of frames, and may not be accurate for devicesor live streams.

This method will always return 0 (e.g. be equal to base_timecode) if no frames have been read.

property position_ms: float

Current position within stream as a float of the presentation time in milliseconds. The first frame has a time of 0.0 ms.

This method will always return 0.0 if no frames have been read.

class scenedetect.backends.opencv.VideoStreamCv2(path=None, framerate=None, max_decode_attempts=5, path_or_device=None)

OpenCV cv2.VideoCapture backend.

Open a video file, image sequence, or network stream.

Parameters:
  • path (AnyStr) – Path to the video. Can be a file, image sequence (‘folder/DSC_%04d.jpg’), or network stream.

  • framerate (float | None) – If set, overrides the detected framerate.

  • max_decode_attempts (int) – Number of attempts to continue decoding the video after a frame fails to decode. This allows processing videos that have a few corrupted frames or metadata (in which case accuracy of detection algorithms may be lower). Once this limit is passed, decoding will stop and emit an error.

  • path_or_device (bytes | str | int) – [DEPRECATED] Specify path for files, image sequences, or network streams/URLs. Use VideoCaptureAdapter for devices/pipes.

Raises:
  • OSError – file could not be found or access was denied

  • VideoOpenFailure – video could not be opened (may be corrupted)

  • ValueError – specified framerate is invalid

read(decode=True, advance=True)

Read and decode the next frame as a np.ndarray. Returns False when video ends, or the maximum number of decode attempts has passed.

Parameters:
  • decode (bool) – Decode and return the frame.

  • advance (bool) – Seek to the next frame. If False, will return the current (last) frame.

Returns:

If decode = True, the decoded frame (np.ndarray), or False (bool) if end of video. If decode = False, a bool indicating if advancing to the the next frame succeeded.

Return type:

ndarray | bool

reset()

Close and re-open the VideoStream (should be equivalent to calling seek(0)).

seek(target)

Seek to the given timecode. If given as a frame number, represents the current seek pointer (e.g. if seeking to 0, the next frame decoded will be the first frame of the video).

For 1-based indices (first frame is frame #1), the target frame number needs to be converted to 0-based by subtracting one. For example, if we want to seek to the first frame, we call seek(0) followed by read(). If we want to seek to the 5th frame, we call seek(4) followed by read(), at which point frame_number will be 5.

Not supported if the VideoStream is a device/camera. Untested with web streams.

Parameters:

target (FrameTimecode | float | int) – Target position in video stream to seek to. If float, interpreted as time in seconds. If int, interpreted as frame number.

Raises:
  • SeekError – An error occurs while seeking, or seeking is not supported.

  • ValueErrortarget is not a valid value (i.e. it is negative).

BACKEND_NAME = 'opencv'

Unique name used to identify this backend.

property aspect_ratio: float

Display/pixel aspect ratio as a float (1.0 represents square pixels).

property capture: VideoCapture

Returns reference to underlying VideoCapture object. Use with caution.

Prefer to use this property only to take ownership of the underlying cv2.VideoCapture object backing this object. Seeking or using the read/grab methods through this property are unsupported and will leave this object in an inconsistent state.

property duration: FrameTimecode | None

Duration of the stream as a FrameTimecode, or None if non terminating.

property frame_number: int

Current position within stream in frames as an int.

1 indicates the first frame was just decoded by the last call to read with advance=True, whereas 0 indicates that no frames have been read.

This method will always return 0 if no frames have been read.

property frame_rate: float

Framerate in frames/sec.

property frame_size: Tuple[int, int]

Size of each video frame in pixels as a tuple of (width, height).

property is_seekable: bool

True if seek() is allowed, False otherwise.

Always False if opening a device/webcam.

property name: str

Name of the video, without extension, or device.

property path: bytes | str

Video or device path.

property position: FrameTimecode

Current position within stream as FrameTimecode.

This can be interpreted as presentation time stamp of the last frame which was decoded by calling read with advance=True.

This method will always return 0 (e.g. be equal to base_timecode) if no frames have been read.

property position_ms: float

Current position within stream as a float of the presentation time in milliseconds. The first frame has a time of 0.0 ms.

This method will always return 0.0 if no frames have been read.

VideoStreamAv provides an adapter for the PyAV av.InputContainer object.

class scenedetect.backends.pyav.VideoStreamAv(path_or_io, framerate=None, name=None, threading_mode=None, suppress_output=False)

PyAV av.InputContainer backend.

Open a video by path.

Warning

Using threading_mode with suppress_output = True can cause lockups in your application. See the PyAV documentation for details: https://pyav.org/docs/stable/overview/caveats.html#sub-interpeters

Parameters:
  • path_or_io (AnyStr | BinaryIO) – Path to the video, or a file-like object.

  • framerate (float | None) – If set, overrides the detected framerate.

  • name (str | None) – Overrides the name property derived from the video path. Should be set if path_or_io is a file-like object.

  • threading_mode (str | None) – The PyAV video stream thread_type. See av.codec.context.ThreadType for valid threading modes (‘AUTO’, ‘FRAME’, ‘NONE’, and ‘SLICE’). If this mode is ‘AUTO’ or ‘FRAME’ and not all frames have been decoded, the video will be reopened if seekable, and the remaining frames decoded in single-threaded mode.

  • suppress_output (bool) – If False, ffmpeg output will be sent to stdout/stderr by calling av.logging.restore_default_callback() before any other library calls. If True the application may deadlock if threading_mode is set. See the PyAV documentation for details: https://pyav.org/docs/stable/overview/caveats.html#sub-interpeters

Raises:
  • OSError – file could not be found or access was denied

  • VideoOpenFailure – video could not be opened (may be corrupted)

  • ValueError – specified framerate is invalid

read(decode=True, advance=True)

Read and decode the next frame as a np.ndarray. Returns False when video ends.

Parameters:
  • decode (bool) – Decode and return the frame.

  • advance (bool) – Seek to the next frame. If False, will return the current (last) frame.

Returns:

If decode = True, the decoded frame (np.ndarray), or False (bool) if end of video. If decode = False, a bool indicating if advancing to the the next frame succeeded.

Return type:

ndarray | bool

reset()

Close and re-open the VideoStream (should be equivalent to calling seek(0)).

seek(target)

Seek to the given timecode. If given as a frame number, represents the current seek pointer (e.g. if seeking to 0, the next frame decoded will be the first frame of the video).

For 1-based indices (first frame is frame #1), the target frame number needs to be converted to 0-based by subtracting one. For example, if we want to seek to the first frame, we call seek(0) followed by read(). If we want to seek to the 5th frame, we call seek(4) followed by read(), at which point frame_number will be 5.

May not be supported on all input codecs (see is_seekable).

Parameters:

target (FrameTimecode | float | int) – Target position in video stream to seek to. If float, interpreted as time in seconds. If int, interpreted as frame number.

Raises:

ValueErrortarget is not a valid value (i.e. it is negative).

Return type:

None

BACKEND_NAME = 'pyav'

Unique name used to identify this backend.

property aspect_ratio: float

Pixel aspect ratio as a float (1.0 represents square pixels).

property duration: FrameTimecode

Duration of the video as a FrameTimecode.

property frame_number: int

Current position within stream as the frame number.

Will return 0 until the first frame is read.

property frame_rate: float

Frame rate in frames/sec.

property frame_size: Tuple[int, int]

Size of each video frame in pixels as a tuple of (width, height).

property is_seekable: bool

True if seek() is allowed, False otherwise.

property name: bytes | str

Name of the video, without extension.

property path: bytes | str

Video path.

property position: FrameTimecode

Current position within stream as FrameTimecode.

This can be interpreted as presentation time stamp, thus frame 1 corresponds to the presentation time 0. Returns 0 even if frame_number is 1.

property position_ms: float

Current position within stream as a float of the presentation time in milliseconds. The first frame has a PTS of 0.