Backends

scenedetect.backends Module

This module contains concrete VideoStream implementations. In addition to creating backend objects directly, scenedetect.open_video() can be used to open a video with a specified backend, falling back to OpenCV if not available. All backends available on the current system can be found via AVAILABLE_BACKENDS.

Usage Example

Assuming we have a file video.mp4 in our working directory, we can load it and iterate through all of the frames:

from scenedetect import open_video
video = open_video('video.mp4')
while True:
    frame = video.read()
    if frame is False:
        break
print("Read %d frames" % video.frame_number)

If we want to use a specific backend from AVAILABLE_BACKENDS, we can pass it to open_video():

# Specifying a backend via `open_video`:
from scenedetect import open_video
video = open_video('video.mp4', backend='opencv')

If the specified backend is not available, OpenCV will be used as a fallback. Other keyword arguments passed to open_video() will be forwarded to the specified backend. Lastly, we can import and use specific backend directly:

# Manually importing and constructing a backend:
from scenedetect.backends.opencv import VideoStreamCv2
video = VideoStreamCv2('video.mp4')

The opencv backend (VideoStreamCv2) is guaranteed to be available.

scenedetect.backends.AVAILABLE_BACKENDS: Dict[str, Type] = {'opencv': <class 'scenedetect.backends.opencv.VideoStreamCv2'>, 'pyav': <class 'scenedetect.backends.pyav.VideoStreamAv'>}

All available backends that scenedetect.open_video() can consider for the backend parameter. These backends must support construction with the following signature:

BackendType(path: str, framerate: Optional[float])

OpenCV

VideoStreamCv2 provides an adapter for the OpenCV cv2.VideoCapture object.

Uses string identifier 'opencv'.

class scenedetect.backends.opencv.VideoStreamCv2(path_or_device, framerate=None, max_decode_attempts=5)

OpenCV cv2.VideoCapture backend.

Open a video or device.

Parameters
  • path_or_device (Union[bytes, str, int]) – Path to video, or device ID as integer.

  • framerate (Optional[float]) – If set, overrides the detected framerate.

  • max_decode_attempts (int) – Number of attempts to continue decoding the video after a frame fails to decode. This allows processing videos that have a few corrupted frames or metadata (in which case accuracy of detection algorithms may be lower). Once this limit is passed, decoding will stop and emit an error.

Raises
  • OSError – file could not be found or access was denied

  • VideoOpenFailure – video could not be opened (may be corrupted)

  • ValueError – specified framerate is invalid

BACKEND_NAME = 'opencv'

Unique name used to identify this backend.

property aspect_ratio: float

Display/pixel aspect ratio as a float (1.0 represents square pixels).

property capture: cv2.VideoCapture

Returns reference to underlying VideoCapture object. Use with caution.

Prefer to use this property only to take ownership of the underlying cv2.VideoCapture object backing this object. Seeking or using the read/grab methods through this property are unsupported and will leave this object in an inconsistent state.

property duration: Optional[scenedetect.frame_timecode.FrameTimecode]

Duration of the stream as a FrameTimecode, or None if non terminating.

property frame_number: int

Current position within stream in frames as an int.

1 indicates the first frame was just decoded by the last call to read with advance=True, whereas 0 indicates that no frames have been read.

This method will always return 0 if no frames have been read.

property frame_rate: float

Framerate in frames/sec.

property frame_size: Tuple[int, int]

Size of each video frame in pixels as a tuple of (width, height).

property is_seekable: bool

True if seek() is allowed, False otherwise.

Always False if opening a device/webcam.

property name: Union[bytes, str]

Name of the video, without extension, or device.

property path: Union[bytes, str]

Video or device path.

property position: scenedetect.frame_timecode.FrameTimecode

Current position within stream as FrameTimecode.

This can be interpreted as presentation time stamp of the last frame which was decoded by calling read with advance=True.

This method will always return 0 (e.g. be equal to base_timecode) if no frames have been read.

property position_ms: float

Current position within stream as a float of the presentation time in milliseconds. The first frame has a time of 0.0 ms.

This method will always return 0.0 if no frames have been read.

read(decode=True, advance=True)

Return next frame (or current if advance = False), or False if end of video.

Parameters
  • decode (bool) – Decode and return the frame.

  • advance (bool) – Seek to the next frame. If False, will remain on the current frame.

Returns

If decode = True, returns either the decoded frame, or False if end of video. If decode = False, a boolean indicating if the next frame was advanced to or not is returned.

Return type

Union[ndarray, bool]

reset()

Close and re-open the VideoStream (should be equivalent to calling seek(0)).

seek(target)

Seek to the given timecode. If given as a frame number, represents the current seek pointer (e.g. if seeking to 0, the next frame decoded will be the first frame of the video).

For 1-based indices (first frame is frame #1), the target frame number needs to be converted to 0-based by subtracting one. For example, if we want to seek to the first frame, we call seek(0) followed by read(). If we want to seek to the 5th frame, we call seek(4) followed by read(), at which point frame_number will be 5.

Not supported if the VideoStream is a device/camera. Untested with web streams.

Parameters

target (Union[FrameTimecode, float, int]) – Target position in video stream to seek to. If float, interpreted as time in seconds. If int, interpreted as frame number.

Raises
  • SeekError – An error occurs while seeking, or seeking is not supported.

  • ValueErrortarget is not a valid value (i.e. it is negative).

scenedetect.backends.opencv.get_aspect_ratio(cap, epsilon=0.0001)

Display/pixel aspect ratio of the VideoCapture as a float (1.0 represents square pixels).

Parameters
  • cap (VideoCapture) –

  • epsilon (float) –

Return type

float

PyAV

VideoStreamAv provides an adapter for the PyAV av.InputContainer object.

Uses string identifier 'pyav'.

class scenedetect.backends.pyav.VideoStreamAv(path_or_io, framerate=None, name=None, threading_mode=None, suppress_output=False)

PyAV av.InputContainer backend.

Open a video by path.

Warning

Using threading_mode with suppress_output = True can cause lockups in your application. See the PyAV documentation for details: https://pyav.org/docs/stable/overview/caveats.html#sub-interpeters

Parameters
  • path_or_io (Union[AnyStr, BinaryIO]) – Path to the video, or a file-like object.

  • framerate (Optional[float]) – If set, overrides the detected framerate.

  • name (Optional[str]) – Overrides the name property derived from the video path. Should be set if path_or_io is a file-like object.

  • threading_mode (Optional[str]) – The PyAV video stream thread_type. See av.codec.context.ThreadType for valid threading modes (‘AUTO’, ‘FRAME’, ‘NONE’, and ‘SLICE’). If this mode is ‘AUTO’ or ‘FRAME’ and not all frames have been decoded, the video will be reopened if seekable, and the remaining frames decoded in single-threaded mode.

  • suppress_output (bool) – If False, ffmpeg output will be sent to stdout/stderr by calling av.logging.restore_default_callback() before any other library calls. If True the application may deadlock if threading_mode is set. See the PyAV documentation for details: https://pyav.org/docs/stable/overview/caveats.html#sub-interpeters

Raises
  • OSError – file could not be found or access was denied

  • VideoOpenFailure – video could not be opened (may be corrupted)

  • ValueError – specified framerate is invalid

BACKEND_NAME = 'pyav'

Unique name used to identify this backend.

property aspect_ratio: float

Pixel aspect ratio as a float (1.0 represents square pixels).

property duration: scenedetect.frame_timecode.FrameTimecode

Duration of the video as a FrameTimecode.

property frame_number: int

Current position within stream as the frame number.

Will return 0 until the first frame is read.

property frame_rate: float

Frame rate in frames/sec.

property frame_size: Tuple[int, int]

Size of each video frame in pixels as a tuple of (width, height).

property is_seekable: bool

True if seek() is allowed, False otherwise.

property name: Union[bytes, str]

Name of the video, without extension.

property path: Union[bytes, str]

Video path.

property position: scenedetect.frame_timecode.FrameTimecode

Current position within stream as FrameTimecode.

This can be interpreted as presentation time stamp, thus frame 1 corresponds to the presentation time 0. Returns 0 even if frame_number is 1.

property position_ms: float

Current position within stream as a float of the presentation time in milliseconds. The first frame has a PTS of 0.

read(decode=True, advance=True)

Return next frame (or current if advance = False), or False if end of video.

Parameters
  • decode (bool) – Decode and return the frame.

  • advance (bool) – Seek to the next frame. If False, will remain on the current frame.

Returns

If decode = True, returns either the decoded frame, or False if end of video. If decode = False, a boolean indicating if the next frame was advanced to or not is returned.

Return type

Union[ndarray, bool]

reset()

Close and re-open the VideoStream (should be equivalent to calling seek(0)).

seek(target)

Seek to the given timecode. If given as a frame number, represents the current seek pointer (e.g. if seeking to 0, the next frame decoded will be the first frame of the video).

For 1-based indices (first frame is frame #1), the target frame number needs to be converted to 0-based by subtracting one. For example, if we want to seek to the first frame, we call seek(0) followed by read(). If we want to seek to the 5th frame, we call seek(4) followed by read(), at which point frame_number will be 5.

May not be supported on all input codecs (see is_seekable).

Parameters

target (Union[FrameTimecode, float, int]) – Target position in video stream to seek to. If float, interpreted as time in seconds. If int, interpreted as frame number.

Raises

ValueErrortarget is not a valid value (i.e. it is negative).

Return type

None