Background
#############

Clustering is an unsupervised machine learning approach where the aim i to determine “natural” or “data-driven” groups in the data without using apriori knowledge about labels or categories. The challenges in unsupervised clustering is that it always produces a partitioning of the samples since each clustering method implicitly impose a structure on the data. The question is: What is a “good” clustering? We need to evaluate the results based on the **clustering tendency**, **number of clusters** and the **clustering quality**.

Aim
#############

``clusteval`` is a Python package that is developed to evaluate the **clustering tendency**, **number of clusters** and **clustering quality**. ``clusteval`` returns the cluster labels for the optimal number of cluster that produces the best partitioning of the samples. The following evaluation strategies are implemented: **silhouette**, **dbindex**, and **derivative** which can be used in combination with **agglomerative** and **kmeans** clustering. In addition **dbscan** and **hdbscan** is implemented for which an internal gridsearch scheme will determine the best partitioning.

.. note::
	The ``clusteval`` library gridsearches across the number of clusters, and method-parameters to determine the optimal number of clusters given the input dataset.


Quickstart
################

A quick example how to learn a model on a given dataset.


.. code:: python

	# Import library
	from clusteval import clusteval

	# Initialize
	cl = clusteval()

	# Generate random data
	X, y = cl.import_example(data='blobs')

	# Fit data X
	results = ce.fit(X)
	
	# Plot
	ce.plot()
	ce.plot_silhouette()
	ce.scatter()
	ce.dendrogram()


.. _schematic_overview:

.. figure:: ../figs/cluster.png


.. include:: add_bottom.add