Getting Started

Detecting and Splitting Scenes in a Movie Clip

As a concrete example to become familiar with PySceneDetect, let's use the following short clip from the James Bond movie, GoldenEye (Copyright © 1995 MGM):

You can download the clip from here (may have to right-click and save-as, put the video in your working directory as goldeneye.mp4). We will first demonstrate using the default parameters, then how to find the optimal threshold/sensitivity for a given video, and lastly, using the PySceneDetect output to split the video into individual scenes/clips.

Content-Aware Detection with Default Parameters

In this case, we want to split this clip up into each individual scene - at each location where a fast cut occurs. This means we need to use content-aware detecton mode (detect-content) or adaptive mode (detect-adaptive). The alternative is to detect fade-in/fade-out using detect-threshold.

Using the following command, let's run PySceneDetect on the video using the default threshold/sensitivity:

scenedetect --input goldeneye.mp4 detect-content list-scenes save-images

Running the above command, in the working directory, you should see a file goldeneye.scenes.csv, as well as thumbnails for the start/middle/end of each scene as goldeneye-XXXX-00/01.jpg (the output directory can be specified with the -o/--output option after the save-images command, or after scenedetect to specify the output for all files). The results should appear as follows:

Scene # Start Time Preview
1 00:00:00.000
2 00:00:03.754
3 00:00:08.759
4 00:00:10.802
5 00:00:15.599
6 00:00:27.110
7 00:00:34.117
8 00:00:36.536
9 00:00:42.501
10 00:00:44.002
11 00:00:45.837
12 00:00:48.966
13 00:00:51.134
14 00:00:52.552
15 00:00:53.428
16 00:00:55.639
17 00:00:56.932
18 00:01:10.779
19 00:01:18.036
20 00:01:19.913
21 00:01:21.999

Note that this is almost perfect - however, one of the scene cuts/breaks in scene 17 was not detected (yielding a total of 21 scenes). To find the proper threshold, we need to generate a statistics file.

Finding Optimal Threshold/Sensitivity Value

We now know that a threshold of 30 does not work in all cases for our video, which is clear if we look at the generated images for scene 17 (note the last image is from a different scene):

We can determine the proper threshold in this case by generating a statistics file (with the -s / --stats option) for the video goldeneye.mp4, and looking at the behaviour of the values where we expect the scene break/cut to occur in scene 17:

scenedetect --input goldeneye.mp4 --stats goldeneye.stats.csv detect-content list-scenes save-images

After examining the file and determining an optimal value of 27 for detect-content, we can set the threshold for the detector via:

scenedetect --input goldeneye.mp4 --stats goldeneye.stats.csv detect-content --threshold 27 list-scenes save-images

Note that specifying the same --stats file again will make parsing the scenes significantly quicker, as the frame metrics stored in this file are re-used as a cache instead of computing them again. Finally, our updated scene list appears as follows (similar entries skipped for brevity):

Scene # Start Time Preview
... ... ...
17 00:00:56.932
18 00:01:06.316
19 00:01:10.779
... ... ...

Now the missing scene (scene number 18, in this case) has been detected properly, and our scene list is larger now due to the added cuts. There should be a total of 22 detected scenes now.

Splitting/Cutting Video into Clips

The last step to automatically split the input file into clips is to specify the split-video command. This will pass a list of the detected scene timecodes to ffmpeg if installed, splitting the input video into scenes.

You may also want to use the -c/--copy option to ensure that no re-encoding is performed (using mkvmerge instead), at the expense of frame-accurate scene cuts, since when copying, cuts can sometimes only be generated on keyframes. You can also pass the -hq/--high-quality option to ensure the output videos are visually identical to the input (at the expense of longer processing time and greater filesize).

Thus, to generate a sequence of files goldeneye-scene-001.mp4, goldeneye-scene-002.mp4, goldeneye-scene-003.mp4..., our full command becomes:

scenedetect -i goldeneye.mp4 -o output_dir detect-content -t 27 list-scenes save-images split-video

The scene number -001 will be added to the output filename automatically.