Detection Algorithms¶

scenedetect.detectors Module

This module contains the following scene detection algorithms:

  • ContentDetector: Detects shot changes using weighted average of pixel changes in the HSV colorspace.

  • ThresholdDetector: Detects slow transitions using average pixel intensity in RGB (fade in/fade out)

  • AdaptiveDetector: Performs rolling average on differences in HSV colorspace. In some cases, this can improve handling of fast motion.

  • HistogramDetector: Uses histogram differences for Y channel in YUV space to find fast cuts.

  • HashDetector: Uses perceptual hashing to calculate similarity between adjacent frames.

Detection algorithms are created by implementing the SceneDetector interface. Detectors are typically attached to a SceneManager when processing videos, however they can also be used to process frames directly.

AdaptiveDetector compares the difference in content between adjacent frames similar to ContentDetector except the threshold isn’t fixed, but is a rolling average of adjacent frame changes. This can help mitigate false detections in situations such as fast camera motions.

This detector is available from the command-line as the detect-adaptive command.

class scenedetect.detectors.adaptive_detector.AdaptiveDetector(adaptive_threshold=3.0, min_scene_len=15, window_width=2, min_content_val=15.0, weights=Components(delta_hue=1.0, delta_sat=1.0, delta_lum=1.0, delta_edges=0.0), luma_only=False, kernel_size=None, video_manager=None, min_delta_hsv=None)¶

Two-pass detector that calculates frame scores with ContentDetector, and then applies a rolling average when processing the result that can help mitigate false detections in situations such as camera movement.

Parameters:
  • adaptive_threshold (float) – Threshold (float) that score ratio must exceed to trigger a new scene (see frame metric adaptive_ratio in stats file).

  • min_scene_len (int) – Once a cut is detected, this many frames must pass before a new one can be added to the scene list. Can be an int or FrameTimecode type.

  • window_width (int) – Size of window (number of frames) before and after each frame to average together in order to detect deviations from the mean. Must be at least 1.

  • min_content_val (float) – Minimum threshold (float) that the content_val must exceed in order to register as a new scene. This is calculated the same way that detect-content calculates frame score based on weights/luma_only/kernel_size.

  • weights (Components) – Weight to place on each component when calculating frame score (content_val in a statsfile, the value threshold is compared against). If omitted, the default ContentDetector weights are used.

  • luma_only (bool) – If True, only considers changes in the luminance channel of the video. Equivalent to specifying weights as ContentDetector.LUMA_ONLY. Overrides weights if both are set.

  • kernel_size (int | None) – Size of kernel to use for post edge detection filtering. If None, automatically set based on video resolution.

  • video_manager – [DEPRECATED] DO NOT USE. For backwards compatibility only.

  • min_delta_hsv (float | None) – [DEPRECATED] DO NOT USE. Use min_content_val instead.

get_content_val(frame_num)¶

Returns the average content change for a frame.

Parameters:

frame_num (int) –

Return type:

float | None

get_metrics()¶

Combines base ContentDetector metric keys with the AdaptiveDetector one.

Return type:

List[str]

post_process(_unused_frame_num)¶

Not required for AdaptiveDetector.

Parameters:

_unused_frame_num (int) –

process_frame(frame_num, frame_img)¶

Process the next frame. frame_num is assumed to be sequential.

Parameters:
  • frame_num (int) – Frame number of frame that is being passed. Can start from any value but must remain sequential.

  • frame_img (numpy.ndarray or None) – Video frame corresponding to frame_img.

Returns:

List of frames where scene cuts have been detected. There may be 0 or more frames in the list, and not necessarily the same as frame_num.

Return type:

List[int]

stats_manager_required()¶

Not required for AdaptiveDetector.

Return type:

bool

property event_buffer_length: int¶

Number of frames any detected cuts will be behind the current frame due to buffering.

ContentDetector compares the difference in content between adjacent frames against a set threshold/score, which if exceeded, triggers a scene cut.

This detector is available from the command-line as the detect-content command.

class scenedetect.detectors.content_detector.ContentDetector(threshold=27.0, min_scene_len=15, weights=Components(delta_hue=1.0, delta_sat=1.0, delta_lum=1.0, delta_edges=0.0), luma_only=False, kernel_size=None, filter_mode=Mode.MERGE)¶

Detects fast cuts using changes in colour and intensity between frames.

The difference is calculated in the HSV color space, and compared against a set threshold to determine when a fast cut has occurred.

Parameters:
  • threshold (float) – Threshold the average change in pixel intensity must exceed to trigger a cut.

  • min_scene_len (int) – Once a cut is detected, this many frames must pass before a new one can be added to the scene list. Can be an int or FrameTimecode type.

  • weights (ContentDetector.Components) – Weight to place on each component when calculating frame score (content_val in a statsfile, the value threshold is compared against).

  • luma_only (bool) – If True, only considers changes in the luminance channel of the video. Equivalent to specifying weights as ContentDetector.LUMA_ONLY. Overrides weights if both are set.

  • kernel_size (int | None) – Size of kernel for expanding detected edges. Must be odd integer greater than or equal to 3. If None, automatically set using video resolution.

  • filter_mode (Mode) – Mode to use when filtering cuts to meet min_scene_len.

class Components(delta_hue=1.0, delta_sat=1.0, delta_lum=1.0, delta_edges=0.0)¶

Components that make up a frame’s score, and their default values.

Create new instance of Components(delta_hue, delta_sat, delta_lum, delta_edges)

Parameters:
  • delta_hue (float) –

  • delta_sat (float) –

  • delta_lum (float) –

  • delta_edges (float) –

delta_edges: float¶

Difference between calculated edges of adjacent frames.

Edge differences are typically larger than the other components, so the detection threshold may need to be adjusted accordingly.

delta_hue: float¶

Difference between pixel hue values of adjacent frames.

delta_lum: float¶

Difference between pixel luma (brightness) values of adjacent frames.

delta_sat: float¶

Difference between pixel saturation values of adjacent frames.

get_metrics()¶

Get Metrics: Get a list of all metric names/keys used by the detector.

Returns:

List of strings of frame metric key names that will be used by the detector when a StatsManager is passed to process_frame.

is_processing_required(frame_num)¶

[DEPRECATED] DO NOT USE

Test if all calculations for a given frame are already done.

Returns:

False if the SceneDetector has assigned _metric_keys, and the stats_manager property is set to a valid StatsManager object containing the required frame metrics/calculations for the given frame - thus, not needing the frame to perform scene detection.

True otherwise (i.e. the frame_img passed to process_frame is required to be passed to process_frame for the given frame_num).

process_frame(frame_num, frame_img)¶

Process the next frame. frame_num is assumed to be sequential.

Parameters:
  • frame_num (int) – Frame number of frame that is being passed. Can start from any value but must remain sequential.

  • frame_img (numpy.ndarray or None) – Video frame corresponding to frame_img.

Returns:

List of frames where scene cuts have been detected. There may be 0 or more frames in the list, and not necessarily the same as frame_num.

Return type:

List[int]

DEFAULT_COMPONENT_WEIGHTS = Components(delta_hue=1.0, delta_sat=1.0, delta_lum=1.0, delta_edges=0.0)¶

Default component weights. Actual default values are specified in Components to allow adding new components without breaking existing usage.

FRAME_SCORE_KEY = 'content_val'¶

Key in statsfile representing the final frame score after weighed by specified components.

LUMA_ONLY_WEIGHTS = Components(delta_hue=0.0, delta_sat=0.0, delta_lum=1.0, delta_edges=0.0)¶

Component weights to use if luma_only is set.

METRIC_KEYS = ['content_val', 'delta_hue', 'delta_sat', 'delta_lum', 'delta_edges']¶

All statsfile keys this detector produces.

property event_buffer_length: int¶

The amount of frames a given event can be buffered for, in time. Represents maximum amount any event can be behind frame_number in the result of process_frame().

scenedetect.detectors.hash_detector Module

This module implements the HashDetector, which calculates a hash value for each from of a video using a perceptual hashing algorithm. Then, the differences in hash value between frames is calculated. If this difference exceeds a set threshold, a scene cut is triggered.

This detector is available from the command-line interface by using the detect-hash command.

class scenedetect.detectors.hash_detector.HashDetector(threshold=0.395, size=16, lowpass=2, min_scene_len=15)¶

Detects cuts using a perceptual hashing algorithm. Applies a direct cosine transform (DCT) and lowpass filter, followed by binary thresholding on the median. See references below:

  1. https://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html

  2. https://github.com/JohannesBuchner/imagehash

Parameters:
  • threshold (float) – Value from 0.0 and 1.0 representing the relative hamming distance between the perceptual hashes of adjacent frames. A distance of 0 means the image is the same, and 1 means no correlation. Smaller threshold values thus require more correlation, making the detector more sensitive. The hamming distance is divided by size x size before comparing to threshold for normalization.

  • size (int) – Size of square of low frequency data to use for the DCT

  • lowpass (int) – How much high frequency information to filter from the DCT. A value of 2 means keep lower 1/2 of the frequency data, 4 means only keep 1/4, etc…

  • min_scene_len (int) – Once a cut is detected, this many frames must pass before a new one can be added to the scene list. Can be an int or FrameTimecode type.

get_metrics()¶

Get Metrics: Get a list of all metric names/keys used by the detector.

Returns:

List of strings of frame metric key names that will be used by the detector when a StatsManager is passed to process_frame.

static hash_frame(frame_img, hash_size, factor)¶

Calculates the perceptual hash of a frame and returns it. Based on phash from https://github.com/JohannesBuchner/imagehash.

Return type:

ndarray

is_processing_required(frame_num)¶

[DEPRECATED] DO NOT USE

Test if all calculations for a given frame are already done.

Returns:

False if the SceneDetector has assigned _metric_keys, and the stats_manager property is set to a valid StatsManager object containing the required frame metrics/calculations for the given frame - thus, not needing the frame to perform scene detection.

True otherwise (i.e. the frame_img passed to process_frame is required to be passed to process_frame for the given frame_num).

process_frame(frame_num, frame_img)¶

Similar to ContentDetector, but using a perceptual hashing algorithm to calculate a hash for each frame and then calculate a hash difference frame to frame.

Parameters:
  • frame_num (int) – Frame number of frame that is being passed.

  • frame_img (Optional[int]) – Decoded frame image (numpy.ndarray) to perform scene detection on. Can be None only if the self.is_processing_required() method (inhereted from the base SceneDetector class) returns True.

Returns:

List of frames where scene cuts have been detected. There may be 0 or more frames in the list, and not necessarily the same as frame_num.

Return type:

List[int]

HistogramDetector compares the difference in the YUV histograms of subsequent frames. If the difference exceeds a given threshold, a cut is detected.

This detector is available from the command-line as the detect-hist command.

class scenedetect.detectors.histogram_detector.HistogramDetector(threshold=0.05, bins=256, min_scene_len=15)¶

Compares the difference in the Y channel of YUV histograms for adjacent frames. When the difference exceeds a given threshold, a cut is detected.

Parameters:
  • threshold (float) – maximum relative difference between 0.0 and 1.0 that the histograms can differ. Histograms are calculated on the Y channel after converting the frame to YUV, and normalized based on the number of bins. Higher dicfferences imply greater change in content, so larger threshold values are less sensitive to cuts.

  • bins (int) – Number of bins to use for the histogram.

  • min_scene_len (int) – Once a cut is detected, this many frames must pass before a new one can be added to the scene list. Can be an int or FrameTimecode type.

static calculate_histogram(frame_img, bins=256, normalize=True)¶

Calculates and optionally normalizes the histogram of the luma (Y) channel of an image converted from BGR to YUV color space.

This function extracts the Y channel from the given BGR image, computes its histogram with the specified number of bins, and optionally normalizes this histogram to have a sum of one across all bins.

Args:¶

frame_imgnp.ndarray

The input image in BGR color space, assumed to have shape (height, width, 3) where the last dimension represents the BGR channels.

binsint, optional (default=256)

The number of bins to use for the histogram.

normalizebool, optional (default=True)

A boolean flag that determines whether the histogram should be normalized such that the sum of all histogram bins equals 1.

Returns:¶

np.ndarray

A 1D numpy array of length equal to bins, representing the histogram of the luma channel. Each element in the array represents the count (or frequency) of a particular luma value in the image. If normalized, these values represent the relative frequency.

Examples:¶

>>> img = cv2.imread("path_to_image.jpg")
>>> hist = calculate_histogram(img, bins=256, normalize=True)
>>> print(hist.shape)
(256,)
Parameters:
  • frame_img (ndarray) –

  • bins (int) –

  • normalize (bool) –

Return type:

ndarray

get_metrics()¶

Get Metrics: Get a list of all metric names/keys used by the detector.

Returns:

List of strings of frame metric key names that will be used by the detector when a StatsManager is passed to process_frame.

Return type:

List[str]

is_processing_required(frame_num)¶

[DEPRECATED] DO NOT USE

Test if all calculations for a given frame are already done.

Returns:

False if the SceneDetector has assigned _metric_keys, and the stats_manager property is set to a valid StatsManager object containing the required frame metrics/calculations for the given frame - thus, not needing the frame to perform scene detection.

True otherwise (i.e. the frame_img passed to process_frame is required to be passed to process_frame for the given frame_num).

Parameters:

frame_num (int) –

Return type:

bool

process_frame(frame_num, frame_img)¶

Computes the histogram of the luma channel of the frame image and compares it with the histogram of the luma channel of the previous frame. If the difference between the histograms exceeds the threshold, a scene cut is detected. Histogram difference is computed using the correlation metric.

Parameters:
  • frame_num (int) – Frame number of frame that is being passed.

  • frame_img (ndarray) – Decoded frame image (numpy.ndarray) to perform scene detection on.

Returns:

List of frames where scene cuts have been detected. There may be 0 or more frames in the list, and not necessarily the same as frame_num.

Return type:

List[int]

ThresholdDetector uses a set intensity as a threshold to detect cuts, which are triggered when the average pixel intensity exceeds or falls below this threshold.

This detector is available from the command-line as the detect-threshold command.

class scenedetect.detectors.threshold_detector.ThresholdDetector(threshold=12, min_scene_len=15, fade_bias=0.0, add_final_scene=False, method=Method.FLOOR, block_size=None)¶

Detects fast cuts/slow fades in from and out to a given threshold level.

Detects both fast cuts and slow fades so long as an appropriate threshold is chosen (especially taking into account the minimum grey/black level).

Parameters:
  • threshold (float) – 8-bit intensity value that each pixel value (R, G, and B) must be <= to in order to trigger a fade in/out.

  • min_scene_len (int) – Once a cut is detected, this many frames must pass before a new one can be added to the scene list. Can be an int or FrameTimecode type.

  • fade_bias (float) – Float between -1.0 and +1.0 representing the percentage of timecode skew for the start of a scene (-1.0 causing a cut at the fade-to-black, 0.0 in the middle, and +1.0 causing the cut to be right at the position where the threshold is passed).

  • add_final_scene (bool) – Boolean indicating if the video ends on a fade-out to generate an additional scene at this timecode.

  • method (Method) – How to treat threshold when detecting fade events.

  • block_size – [DEPRECATED] DO NOT USE. For backwards compatibility.

class Method(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶

Method for ThresholdDetector to use when comparing frame brightness to the threshold.

CEILING = 1¶

Fade out happens when frame brightness rises above threshold.

FLOOR = 0¶

Fade out happens when frame brightness falls below threshold.

get_metrics()¶

Get Metrics: Get a list of all metric names/keys used by the detector.

Returns:

List of strings of frame metric key names that will be used by the detector when a StatsManager is passed to process_frame.

Return type:

List[str]

post_process(frame_num)¶

Writes a final scene cut if the last detected fade was a fade-out.

Only writes the scene cut if add_final_scene is true, and the last fade that was detected was a fade-out. There is no bias applied to this cut (since there is no corresponding fade-in) so it will be located at the exact frame where the fade-out crossed the detection threshold.

Parameters:

frame_num (int) –

process_frame(frame_num, frame_img)¶

Process the next frame. frame_num is assumed to be sequential.

Parameters:
  • frame_num (int) – Frame number of frame that is being passed. Can start from any value but must remain sequential.

  • frame_img (numpy.ndarray or None) – Video frame corresponding to frame_img.

Returns:

List of frames where scene cuts have been detected. There may be 0 or more frames in the list, and not necessarily the same as frame_num.

Return type:

ty.List[int]