Input

The input for the undouble.undouble.Undouble.import_data() can be the following three types:

  • Directory path

  • File locations

  • Numpy array containing images

The scanned files and directories can also be filtered on extention type, or directories can be black listed. Note that these settings need to be set during initialization. The black_list directory is set to undouble by default to make sure that readily moved files are not incorporated in the analysis.

The following parameters can be changed during initialization:

  • Images are imported with the extention ([‘png’,’tiff’,’jpg’,’jfif’]).

  • Input image can be grayscaled during import.

  • Resizing images to save memory, such as to (128, 128).

Directory

Images can imported recursively from a target directory.

# Import library
from undouble import Undouble

# Init with default settings
model = Undouble()

# Import data
input_list_of_files = model.import_example(data='flowers')
input_directory, _ = os.path.split(input_list_of_files[0])

# The target directory looks as following:
print(input_directory)
# 'C:\\TEMP\\flower_images'

# Importing the files files from disk, cleaning and pre-processing
model.import_data(input_directory)

# [clustimage] >INFO> Extracting images from: [C:\\TEMP\\flower_images]
# [clustimage] >INFO> [214] files are collected recursively from path: [C:\\TEMP\\flower_images]
# [clustimage] >INFO> [214] images are extracted.
# [clustimage] >INFO> Reading and checking images.
# [clustimage] >INFO> Reading and checking images.
# [clustimage]: 100%|██████████| 214/214 [00:01<00:00, 133.25it/s]

# Compute hash
model.compute_hash()

# Find images with image-hash <= threshold
model.group(threshold=0)

# Plot the images
model.plot()

File locations

Read images recursively from a target directory.

# Import library
from undouble import Undouble

# Init with default settings
model = Undouble()

# Import data; Pathnames to the images.
input_list_of_files = model.import_example(data='flowers')

# [undouble] >INFO> Store examples at [..\undouble\data]..
# [undouble] >INFO> Downloading [flowers] dataset from github source..
# [undouble] >INFO> Extracting files..
# [undouble] >INFO> [214] files are collected recursively from path: [..\undouble\undouble\data\flower_images]

# The list image path locations looks as following but may differ on your machine.
print(input_list_of_files)

# ['\\repos\\undouble\\undouble\\data\\flower_images\\0001.png',
#  '\\repos\\undouble\\undouble\\data\\flower_images\\0002.png',
#  '\\repos\\undouble\\undouble\\data\\flower_images\\0003.png',
#  ...]

model.import_data(input_list_of_files)

# [clustimage] >INFO> Reading and checking images.
# [clustimage] >INFO> Reading and checking images.
# [clustimage]: 100%|██████████| 214/214 [00:02<00:00, 76.44it/s]

# Compute hash
model.compute_hash()

# Find images with image-hash <= threshold
model.group(threshold=0)

# Plot the images
model.plot()

Numpy Array

Images can also be in the form of a numpy-array.

# Import library
from undouble import Undouble

# Init with default settings
model = Undouble()

# Import data; numpy array containing images.
X, y = model.import_example(data='mnist')

print(X)
# array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
#        ...,
#        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
#        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
#        [ 0.,  0., 10., ..., 12.,  1.,  0.]])

# Compute hash
model.compute_hash()

# Find images with image-hash <= threshold
model.group(threshold=0)

# Plot the images
model.plot()

Output

The output is stored in model.results

# Import library
from undouble import Undouble

# Print all keys
print(model.results.keys())

# dict_keys(['img',
#            'pathnames',
#            'url',
#            'filenames',
#            'img_hash_bin',
#            'img_hash_hex',
#            'adjmat',
#            'select_pathnames',
#            'select_scores',
#            'select_idx',
#            'stats'])

# Pathnames
model.results['pathnames']

# array(['D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0001.png',
#        'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0002.png',
#        'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0003.png',...

# Filenames
model.results['filenames']
# array(['0001.png', '0002.png', '0003.png',...

# Adjacency matrix
model.results['adjmat']
# array([[ 0, 24, 24, ..., 30, 28, 26],
#        [24,  0, 28, ..., 28, 18, 36],
#        [24, 28,  0, ..., 28, 28, 28],
#        ...,
#        [30, 28, 28, ...,  0, 24, 34],
#        [28, 18, 28, ..., 24,  0, 34],
#        [26, 36, 28, ..., 34, 34,  0]])

# Select groupings
model.results['select_idx']
# [array([81, 82], dtype=int64),
#  array([90, 91, 92], dtype=int64),
#  array([169, 170], dtype=int64)]

Extract Groups

Extracting the groups can be done using the group-index combined with the pathnames (or filenames).

# Import library
from undouble import Undouble

# Init with default settings
model = Undouble()

# Import data; Pathnames to the images.
input_list_of_files = model.import_example(data='flowers')

# Import data from files.
model.import_data(input_list_of_files)

# Compute hash
model.compute_hash()

# Find images with image-hash <= threshold
model.group(threshold=0)

# [undouble] >INFO> [3] groups with similar image-hash.
# [undouble] >INFO> [3] groups are detected for [7] images.

# Plot the images
model.plot()

# Extract the pathnames for each group
for idx_group in model.results['select_idx']:
    print(idx_group)
    print(model.results['pathnames'][idx_group])


# [81 82]
# ['D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0082 - Copy.png'
#  'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0082.png']
# [90 91 92]
# ['D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0090 - Copy (2).png'
#  'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0090 - Copy.png'
#  'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0090.png']
# [169 170]
# ['D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0167 - Copy.png'
#  'D:\\REPOS\\undouble\\undouble\\data\\flower_images\\0167.png']