pca A Python Package for Principal Component Analysis. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages.

But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be choosen.

Other functionalities are:

  • Biplot to plot the loadings
  • Determine the explained variance
  • Extract the best performing features
  • Scatter plot with the loadings
  • Outlier detection using Hotelling T2 and/or SPE/Dmodx

This notebook will show some examples.

More information can be found here:

  • github pca
In [1]:
!pip install pca
Requirement already satisfied: pca in /usr/local/lib/python3.7/dist-packages (1.7.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from pca) (3.2.2)
Requirement already satisfied: wget in /usr/local/lib/python3.7/dist-packages (from pca) (3.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from pca) (4.63.0)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from pca) (1.3.5)
Requirement already satisfied: colourmap in /usr/local/lib/python3.7/dist-packages (from pca) (1.1.4)
Requirement already satisfied: sklearn in /usr/local/lib/python3.7/dist-packages (from pca) (0.0)
Requirement already satisfied: scatterd in /usr/local/lib/python3.7/dist-packages (from pca) (1.1.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from pca) (1.21.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from pca) (1.4.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (0.11.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (1.4.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (3.0.7)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->pca) (3.10.0.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->pca) (1.15.0)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->pca) (2018.9)
Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from scatterd->pca) (0.11.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sklearn->pca) (1.0.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->pca) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->pca) (1.1.0)

Lets check the version

In [2]:
import pca
print(pca.__version__)
1.7.2

Import the pca library

In [3]:
from pca import pca
import numpy as np
import pandas as pd

Here we will create a random dataset.

In [4]:
# Dataset
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

Initialize using specified parameters. The parameters here are the default parameters.

In [5]:
# Initialize
model = pca(n_components=3, normalize=True)
In [6]:
# Fit transform
out = model.fit_transform(X)
[pca] >Processing dataframe..
[pca] >Normalizing input data per feature (zero mean and unit variance)..
[pca] >The PCA reduction is performed on the [4] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [7]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(textlabel=True, legend=False, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
No description has been provided for this image
In [8]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(cmap=None, textlabel=False, legend=False, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
No description has been provided for this image
In [9]:
from pca import pca
# Load example data
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
In [10]:
# Initialize
model = pca(n_components=3)
# Fit using PCA
results = model.fit_transform(X)
[pca] >Processing dataframe..
[pca] >The PCA reduction is performed on the [4] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [11]:
# Make plots

# Scree plot
fig, ax = model.plot(figsize=(10, 6))

# Scatter plot
fig, ax = model.scatter(figsize=(10, 6))

# Biplot
fig, ax = model.biplot(figsize=(10, 6))
No description has been provided for this image
No description has been provided for this image
[pca] >Plot PC1 vs PC2 with loadings.
<Figure size 432x288 with 0 Axes>
No description has been provided for this image
In [17]:
# Coloring on density to focus on the core samples.

# Biplot
fig, ax = model.biplot(figsize=(10, 6), gradient='#ffffff')
[pca] >Plot PC1 vs PC2 with loadings.
No description has been provided for this image
In [18]:
# Outliers are stored in results:
results['outliers']

# Outliers based on T2
print('--------T2------------')
print(results['outliers'].loc[results['outliers']['y_bool'], :])

# Outliers based on SPE
print('--------SPE------------')
print(results['outliers'].loc[results['outliers']['y_bool_spe'], :])
--------T2------------
    y_proba    y_score  y_bool  y_bool_spe  y_score_spe
2  0.040551  13.160880    True        True     3.679935
2  0.034521  13.593534    True       False     3.804358
--------SPE------------
    y_proba    y_score  y_bool  y_bool_spe  y_score_spe
0  0.156357   9.320166   False        True     2.895547
0  0.179867   8.889872   False        True     2.735616
0  0.210964   8.389033   False        True     2.745924
0  0.332472   6.875952   False        True     2.366655
0  0.195760   8.625509   False        True     2.770926
0  0.177125   8.937469   False        True     2.819291
0  0.127482   9.933473   False        True     3.000718
1  0.807398   3.011533   False        True     1.253392
1  0.737415   3.549180   False        True     1.364033
1  0.800469   3.066391   False        True     1.233293
2  0.697011   3.849676   False        True     1.301674
2  0.123935  10.017078   False        True     3.019967
2  0.040551  13.160880    True        True     3.679935
2  0.054265  12.366968   False        True     3.510781
2  0.108525  10.407018   False        True     3.152509
In [20]:
# Outlier detection
fig, ax = model.biplot(SPE=True, HT2=True, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
No description has been provided for this image
In [21]:
# 3D plots
fig, ax = model.scatter3d(figsize=(10, 6))
fig, ax = model.biplot3d(figsize=(10, 6))
fig, ax = model.biplot3d(SPE=True, HT2=True, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
No description has been provided for this image
No description has been provided for this image
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
No description has been provided for this image
In [22]:
fig, ax = model.biplot3d(SPE=True, HT2=True, visible=False, figsize=(10, 6))

# Set the figure again to True and show the figure.
fig.set_visible(True)
fig
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
Out[22]:
No description has been provided for this image
No description has been provided for this image
In [23]:
# Normalize out PCs
X_norm = model.norm(X)
[pca] >Cleaning previous fitted model results..
[pca] >Column labels are auto-completed.
[pca] >Row labels are auto-completed.
[pca] >The PCA reduction is performed on the [4] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[4]
[pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [24]:
print(X_norm)
[[2.49827932 2.78822199 2.73760805 2.64042916]
 [2.17418189 2.28972391 2.19907228 2.29205175]
 [2.37043166 2.24256734 2.25379844 2.32729489]
 [2.43101742 2.10641429 2.20334321 2.2064176 ]
 [2.62149842 2.76834241 2.78567422 2.65152651]
 [2.94051298 3.3602201  3.31931842 3.14971943]
 [2.67073929 2.31133341 2.41744878 2.48386043]
 [2.50867666 2.64467154 2.64574665 2.53260032]
 [2.30124683 1.7971212  1.89540188 2.02990525]
 [2.26048037 2.34694131 2.3476342  2.22636441]
 [2.58633591 3.1829288  3.08815008 2.85878304]
 [2.64229309 2.48124151 2.60195141 2.43586883]
 [2.17050087 2.18859607 2.17296045 2.14463246]
 [2.22850563 1.73933529 1.83573841 1.97457016]
 [2.51342988 3.69899386 3.40633499 3.22411108]
 [3.12251957 3.92933711 3.81764254 3.56603525]
 [2.73975929 3.33063051 3.15369372 3.2019132 ]
 [2.54367435 2.80393612 2.72111917 2.74600693]
 [2.68847085 3.535213   3.37294253 3.11672737]
 [2.83837804 3.0079359  3.03452596 2.8917751 ]
 [2.44219749 3.0011212  2.89896181 2.67386954]
 [2.80226798 2.9581159  2.92737021 2.944414  ]
 [2.58760075 2.39709795 2.4496467  2.53635417]
 [2.6220195  2.7264883  2.63102619 2.8121394 ]
 [2.79285836 2.5034337  2.72616994 2.3967235 ]
 [2.23284473 2.38993242 2.32448533 2.30779639]
 [2.64965514 2.68349719 2.65417506 2.73070742]
 [2.50675374 2.8810331  2.82161493 2.66922224]
 [2.37506023 2.80810158 2.68954188 2.62933181]
 [2.52099692 2.26475953 2.37801696 2.28814956]
 [2.39777783 2.28463912 2.32995079 2.27705221]
 [2.4326107  3.01775466 2.7831717  2.91112196]
 [2.95038924 3.25852376 3.38210504 2.8812777 ]
 [2.90195892 3.58861577 3.54267897 3.17836736]
 [2.3058754  2.36265544 2.33114532 2.33194218]
 [2.19510122 2.49141109 2.34019438 2.4658679 ]
 [2.28123488 3.12247946 2.86660469 2.8208437 ]
 [2.61781739 2.66721456 2.75956239 2.50410721]
 [2.33256349 1.85525793 1.94466258 2.09589257]
 [2.46696266 2.73008526 2.68834735 2.57444185]
 [2.53519993 2.711125   2.63711229 2.71721384]
 [1.76570889 1.49764686 1.3361063  1.87273976]
 [2.49557367 1.98632619 2.12599633 2.20177031]
 [2.82195028 2.78045957 2.71186418 2.99480183]
 [3.08452676 3.05323962 3.18366178 2.9451591 ]
 [2.26129092 2.22002432 2.13998269 2.35578799]
 [2.84317143 2.99961918 3.09242101 2.77314889]
 [2.46233408 2.16455102 2.25260391 2.27240492]
 [2.62804991 3.09751508 3.04554937 2.81694151]
 [2.37698315 2.57174001 2.5136736  2.49270989]
 [3.66215623 4.64716384 4.44355803 4.11293616]
 [3.85745844 4.13560087 4.08865258 3.99356167]
 [3.76813702 4.52672492 4.37661393 4.09763665]
 [3.15760652 2.70865502 2.71519127 2.99462477]
 [3.5399125  3.96627547 3.80999197 3.81059926]
 [3.73264607 3.24414009 3.46075792 3.27775997]
 [4.12644941 4.14623021 4.20304222 4.0841399 ]
 [3.0018916  2.16278269 2.30987734 2.57112028]
 [3.48891353 4.08579506 3.9762373  3.69422412]
 [3.6039755  2.72286712 2.8921616  3.19948189]
 [2.73453409 2.00085468 2.0726229  2.37510943]
 [3.75245302 3.55527184 3.57009679 3.71762162]
 [2.73134633 3.02304709 2.88699456 2.83416021]
 [3.79306701 3.68183801 3.78815108 3.57754582]
 [3.40416936 3.15768392 3.13616851 3.40629329]
 [3.55522789 4.30319637 4.10087052 3.97361804]
 [4.02816031 3.32122289 3.5665132  3.55295172]
 [3.2724882  3.18728771 3.29653369 3.00212308]
 [3.12583556 3.30943214 3.09678245 3.3804899 ]
 [3.13792422 2.88631135 2.93069731 2.94423694]
 [4.35277882 3.77786686 3.95040095 4.06194202]
 [3.31484793 3.54880797 3.42412986 3.51036828]
 [3.52939052 3.62103784 3.57700847 3.52895427]
 [3.62077186 3.58487563 3.73046196 3.31345141]
 [3.42177628 3.89277544 3.76681737 3.6496864 ]
 [3.5154368  4.15224852 3.96760295 3.87883765]
 [3.46975231 4.22159729 4.03709531 3.80444918]
 [3.91103842 4.329189   4.20917406 4.15912183]
 [3.7797992  3.59734362 3.64624915 3.66737895]
 [2.93156658 2.99195548 2.91482906 2.98563334]
 [3.04794471 2.7279661  2.75602355 2.86250498]
 [2.95236126 2.70485458 2.73110626 2.76997566]
 [3.26290141 3.20392116 3.18074358 3.2393755 ]
 [3.96331458 3.52637387 3.69686357 3.58878832]
 [4.11158832 3.15039545 3.48131179 3.46926867]
 [4.23271967 3.9407284  4.08309463 4.03765108]
 [3.75118818 4.34110269 4.20860017 4.04005049]
 [3.02464817 3.42155434 3.22162161 3.2771632 ]
 [3.73661656 3.26020504 3.43386626 3.39398995]
 [3.3206167  2.83972328 2.89652502 3.10050251]
 [3.55748045 2.91913288 3.16930547 2.99566985]
 [3.82438367 3.73997474 3.83741178 3.64353314]
 [3.23158475 3.14578443 3.13148288 3.17338819]
 [2.87867251 2.18266228 2.26181117 2.56002293]
 [3.54228972 3.07100005 3.20327182 3.22212489]
 [3.69969595 3.33730203 3.53436202 3.31720527]
 [3.66358589 3.28748203 3.42720627 3.36984416]
 [3.50520429 3.72194801 3.68161596 3.56600335]
 [2.89479844 2.39266618 2.34503821 2.85246529]
 [3.53189238 3.2145505  3.29513322 3.32995373]
 [5.18745415 4.38382351 4.59292258 4.86471007]
 [4.18292767 3.40268882 3.56219553 3.82183858]
 [4.37745831 4.80027695 4.68627694 4.63136302]
 [4.34291491 3.98209852 4.18005255 3.96610397]
 [4.62294894 4.29611138 4.37277766 4.49894008]
 [4.52020724 5.27912732 5.18912368 4.74923156]
 [4.00342295 2.42708447 2.78199616 3.20652223]
 [4.27709381 4.88801747 4.89590281 4.29318013]
 [3.95041538 4.07641166 4.07060022 3.8956177 ]
 [5.10673179 5.35654674 5.28973571 5.38705197]
 [4.34385011 4.3439696  4.29724594 4.48500138]
 [4.03302049 3.92996591 3.9006121  4.04679085]
 [4.30184663 4.51444621 4.39285013 4.55803222]
 [4.05683811 3.19452357 3.28036603 3.79274552]
 [4.4914079  3.54679357 3.57041801 4.4026663 ]
 [4.62212604 4.32049306 4.28799095 4.73379628]
 [4.29080357 4.21106269 4.31451465 4.11577433]
 [5.22611739 5.9119256  5.98197666 5.30711338]
 [4.3938282  5.15602496 4.96029766 4.75132781]
 [3.46020569 3.1755917  3.21861191 3.23156464]
 [4.6143097  4.77715123 4.66661917 4.89081014]
 [4.29287896 3.29831484 3.46835976 3.92276906]
 [4.32027645 5.22515604 5.10828569 4.56656913]
 [3.82858578 3.79924848 3.70887558 3.95156532]
 [4.68845275 4.64042968 4.70506239 4.64891042]
 [4.41275782 4.97701395 5.0010842  4.44930054]
 [3.90161645 3.77197149 3.71553557 3.97571111]
 [4.15652905 3.82502343 3.89567479 4.02669889]
 [4.3558809  4.04912048 4.08251974 4.27173993]
 [4.05858074 4.79972264 4.76991586 4.15836415]
 [4.0988929  4.90881638 4.74853541 4.41375743]
 [4.90133406 6.02913258 5.9759373  5.21878622]
 [4.40127593 4.06483461 4.06603086 4.3773177 ]
 [3.87428263 3.83243503 3.93182143 3.661674  ]
 [4.00024754 3.5518122  3.88880603 3.30129323]
 [4.31834117 5.35898229 4.99171575 5.06747083]
 [5.02281052 4.40405392 4.53445364 4.86426494]
 [4.41402266 4.1911831  4.36258082 4.12687168]
 [4.14805463 3.73221232 3.81166791 3.9979058 ]
 [4.29144929 4.65799666 4.48471153 4.66586106]
 [4.61143923 4.5491064  4.43285583 4.87281443]
 [4.23167408 4.66723271 4.32751525 4.91616192]
 [4.18292767 3.40268882 3.56219553 3.82183858]
 [4.75640055 4.70653231 4.70683081 4.82287173]
 [4.87003286 4.70328618 4.63910688 5.0712215 ]
 [4.28378543 4.43826855 4.19305314 4.76649156]
 [3.76115905 3.69129174 3.55245913 3.93821691]
 [4.23102835 4.22029874 4.15731837 4.3660752 ]
 [4.91875265 4.28813129 4.42552946 4.74294253]
 [4.34033391 3.6689908  3.89328573 3.91691895]]