Duplicate Image Removal

The system uses a variety of methods to remove duplicate slides and obtain a set of unique frames containing the one best representation of each slide in the presentation. One method that is applied at various steps of the procedure (during black border removal, perspective crop, and clustering) is image hashing. Standard hashing algorithms will output completely different hashes on images that differ by one-byte but still depict the same content. Image hashing algorithms produce similar output hashes given similar inputs. The system supports 4 hashing methods: average, perception (the default), difference, and wavelet hashing. These algorithms analyze the image structure based on luminance (without color information). This process will only remove extremely similar images and thus can safely be applied without any false-positives. However, since the presenter moving slightly will cause the algorithm to detect two unique images even though they contain the same slide, we employ clustering (see Slide Clustering) and feature matching (see SIFT Matcher & Perspective Cropping) algorithms.