Comparison high dimensional embedding: PCA vs tSNE
#########################################################

In the following example we load the mnist dataset and make a PCA and tSNE embedding for which we will analyze the distribution of samples in the embedding. The comparison between the top 50D of PCA vs. 2D tSNE resulted in high similarities on local and global scales. The axis are the number of "neirest neighbors" (nn). What we see is that on local scales (low nn) high similarity is seen between the maps but also in higher scales.

.. code:: python
	
	# Load libraries
	from sklearn import (manifold, decomposition)
	import pandas as pd
	import numpy as np

	# Import library
	import flameplot as flameplot

	# Load mnist example data
	X,y = flameplot.import_example()

	# PCA: 50 PCs
	X_pca_50 = decomposition.TruncatedSVD(n_components=50).fit_transform(X)

	# tSNE: 2D
	X_tsne = manifold.TSNE(n_components=2, init='pca').fit_transform(X)

	# Compare PCA(50) vs. tSNE
	scores = flameplot.compare(X_pca_50, X_tsne, n_steps=5)

	# Plot
	fig, ax = flameplot.plot(scores1, xlabel='PCA (50d)', ylabel='tSNE (2d)')


.. |fig1| image:: ../figs/pca50_tsne.png

.. table:: PCA 50D vs t-SNE 2D
   :align: center

   +----------+
   | |fig1|   |
   +----------+


Comparison 2D embeddings: PCA vs tSNE
#########################################################

The comparison between the top 2D of PCA vs. 2D tSNE resulted in much lower similarities compared to the 50D on local and global scales. What we see is that on local scales (low nn) there is low similarity which depicts that samples have different neighbors. On larger scale it becomes a bit more greenish and slightly more similarities are seen on average between the neighbors. This would basically suggest that the same digits are detected globally but are differently ordered on local scales.


.. code:: python

	# PCA top 2 PCs
	X_pca_2 = decomposition.TruncatedSVD(n_components=2).fit_transform(X)
	
	# tSNE
	X_tsne = manifold.TSNE(n_components=2, init='pca').fit_transform(X)

	# Compare PCA(2) vs. tSNE
	scores = flameplot.compare(X_pca_2, X_tsne, n_steps=5)

	# Plot
	fig, ax = flameplot.plot(scores, xlabel='PCA (2d)', ylabel='tSNE (2d)')


.. |fig2| image:: ../figs/pca2_tsne.png

.. table:: PCA 2D vs t-SNE 2D
   :align: center

   +----------+
   | |fig2|   |
   +----------+


Comparison Random data vs. t-SNE
#########################################################

The comparison between the Random data points vs. 2D tSNE resulted in low similarities on both local and global scales. This what we expect to see as we permuted the data.


.. code:: python

	# Random
	X_rand=np.c_[np.random.permutation(X_tsne[:,0]), np.random.permutation(X_tsne[:,1])]

	# Compare random vs. tSNE
	scores = flameplot.compare(X_rand, X_tsne, n_steps=5)

	# Plot
	fig, ax = flameplot.plot(scores, xlabel='Random (2d)', ylabel='tSNE (2d)')


.. |fig3| image:: ../figs/random_tsne.png

.. table:: Random data vs t-SNE
   :align: center

   +----------+
   | |fig3|   |
   +----------+


Scatterplots
#########################################################

Scatter plots can also being created:

.. code:: python
	
	# Create scatterplot of PCA
	fig, ax = flameplot.scatter(X_pca_2[:,0], X_pca_2[:,1], labels=y, title='PCA', density=False)

	# Create scatterplot of t-SNE
	fig, ax = flameplot.scatter(X_tsne[:,0],  X_tsne[:,1],  labels=y, title='tSNE')

	# Create scatterplot of the random data
	fig, ax = flameplot.scatter(X_rand[:,0],  X_rand[:,1],  labels=y, title='Random')


.. |fig4| image:: ../figs/scatter_pca.png
.. |fig5| image:: ../figs/scatter_tsne.png
.. |fig6| image:: ../figs/scatter_random.png

.. table:: Scatterplots
   :align: center

   +----------+
   | |fig4|   |
   +----------+
   | |fig5|   |
   +----------+
   | |fig6|   |
   +----------+


.. include:: add_bottom.add