Performance
''''''''''''''''''''''
To measure the performance of various methods implementend in ``clustimage``, we can use the **digits** dataset to determine the match between clustered sampels and the true label.
It can be seen that multiple different parameters still result in similar good performance based on the results below.

The following peace of code clusters the **digit** images, compares the detected cluster labels with the true label, and finally computes the accuracy.

.. code:: python

    import matplotlib.pyplot as plt
    from sklearn.datasets import load_digits
    from clustimage import Clustimage
    import classeval as clf
    import itertools as it
    from sklearn.metrics import accuracy_score

    # Load example data
    digits = load_digits(n_class=10)
    X, y_true = digits.data, digits.target

    param_grid = {
    	'method':['pca', 'hog', None],
    	'evaluate' : ['silhouette', 'dbindex', 'derivative'],
    	'cluster_space' : ['low', 'high'],
    	}

    scores = []
    labels = []
    allNames = param_grid.keys()
    combinations = list(it.product(*(param_grid[Name] for Name in allNames)))

    # Iterate over all combinations
    for combination in combinations:
        # Initialize
        cl = Clustimage(method=combination[0])
        
        # Preprocessing, feature extraction and cluster evaluation
        results = cl.fit_transform(X, cluster_space=combination[2], evaluate=combination[1])

        # Compute confmat
        cm = clf.confmatrix.eval(y_true, results['labels'], normalize=False)

        # Transform numbers to make it comparible
        y_pred = results['labels']
        cm_argmax = cm['confmat'].argmax(axis=0)
        y_pred_ = np.array([cm_argmax[i] for i in y_pred])

        # Compute again confmat
        cm = clf.confmatrix.eval(y_true, y_pred_, normalize=False)
        fig,ax = clf.confmatrix.plot(cm)
        ax.set_title('Feature extraction: [%s]\nCluster evaluation with [%s] in [%s] dimension' %(combination[0], combination[1], combination[2]), fontsize=16)
        plt.pause(0.1)

        # Store scores and labels
        scores.append(accuracy_score(y_true,y_pred_))
        labels.append(str(combination[0]) + ' - ' + combination[1] + ' - ' + combination[2])

    # Make plot
    import numpy as np
    scores=np.array(scores)
    labels=np.array(labels)
    isort=np.argsort(scores)
    plt.figure(figsize=(12,6))
    plt.plot(np.arange(0,len(scores)), scores[isort])
    plt.xticks(np.arange(0,len(scores)), labels[isort], rotation='vertical')
    plt.margins(0.2)
    plt.title("Comparison of various approaches.", fontsize=14)
    plt.grid(True)


.. |figP1| image:: ../figs/performance_approaches.png

.. table:: Comparison of the performance for the digits dataset using various methods.
   :align: center

   +----------+
   | |figP1|  |
   +----------+


.. |figP2| image:: ../figs/best_digits.png
.. |figP3| image:: ../figs/digits_pca_dbindex.png

.. table:: Results of the best two approaches.
   :align: center

   +----------+----------+
   | |figP2|  | |figP3|  |
   +----------+----------+


.. include:: add_bottom.add