pca A Python Package for Principal Component Analysis. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages.
But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be choosen.
Other functionalities are:
- Biplot to plot the loadings
- Determine the explained variance
- Extract the best performing features
- Scatter plot with the loadings
- Outlier detection using Hotelling T2 and/or SPE/Dmodx
This notebook will show some examples.
More information can be found here:
In [1]:
!pip install pca
Requirement already satisfied: pca in /usr/local/lib/python3.7/dist-packages (1.7.2) Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from pca) (3.2.2) Requirement already satisfied: wget in /usr/local/lib/python3.7/dist-packages (from pca) (3.2) Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from pca) (4.63.0) Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from pca) (1.3.5) Requirement already satisfied: colourmap in /usr/local/lib/python3.7/dist-packages (from pca) (1.1.4) Requirement already satisfied: sklearn in /usr/local/lib/python3.7/dist-packages (from pca) (0.0) Requirement already satisfied: scatterd in /usr/local/lib/python3.7/dist-packages (from pca) (1.1.1) Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from pca) (1.21.5) Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from pca) (1.4.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (0.11.0) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (2.8.2) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (1.4.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pca) (3.0.7) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->pca) (3.10.0.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->pca) (1.15.0) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->pca) (2018.9) Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from scatterd->pca) (0.11.2) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sklearn->pca) (1.0.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->pca) (3.1.0) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->pca) (1.1.0)
Lets check the version
In [2]:
import pca
print(pca.__version__)
1.7.2
Import the pca library
In [3]:
from pca import pca
import numpy as np
import pandas as pd
Here we will create a random dataset.
In [4]:
# Dataset
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
Initialize using specified parameters. The parameters here are the default parameters.
In [5]:
# Initialize
model = pca(n_components=3, normalize=True)
In [6]:
# Fit transform
out = model.fit_transform(X)
[pca] >Processing dataframe.. [pca] >Normalizing input data per feature (zero mean and unit variance).. [pca] >The PCA reduction is performed on the [4] columns of the input dataframe. [pca] >Fit using PCA. [pca] >Compute loadings and PCs. [pca] >Compute explained variance. [pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3] [pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [7]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(textlabel=True, legend=False, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
In [8]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(cmap=None, textlabel=False, legend=False, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
In [9]:
from pca import pca
# Load example data
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
In [10]:
# Initialize
model = pca(n_components=3)
# Fit using PCA
results = model.fit_transform(X)
[pca] >Processing dataframe.. [pca] >The PCA reduction is performed on the [4] columns of the input dataframe. [pca] >Fit using PCA. [pca] >Compute loadings and PCs. [pca] >Compute explained variance. [pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3] [pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [11]:
# Make plots
# Scree plot
fig, ax = model.plot(figsize=(10, 6))
# Scatter plot
fig, ax = model.scatter(figsize=(10, 6))
# Biplot
fig, ax = model.biplot(figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
<Figure size 432x288 with 0 Axes>
In [17]:
# Coloring on density to focus on the core samples.
# Biplot
fig, ax = model.biplot(figsize=(10, 6), gradient='#ffffff')
[pca] >Plot PC1 vs PC2 with loadings.
In [18]:
# Outliers are stored in results:
results['outliers']
# Outliers based on T2
print('--------T2------------')
print(results['outliers'].loc[results['outliers']['y_bool'], :])
# Outliers based on SPE
print('--------SPE------------')
print(results['outliers'].loc[results['outliers']['y_bool_spe'], :])
--------T2------------ y_proba y_score y_bool y_bool_spe y_score_spe 2 0.040551 13.160880 True True 3.679935 2 0.034521 13.593534 True False 3.804358 --------SPE------------ y_proba y_score y_bool y_bool_spe y_score_spe 0 0.156357 9.320166 False True 2.895547 0 0.179867 8.889872 False True 2.735616 0 0.210964 8.389033 False True 2.745924 0 0.332472 6.875952 False True 2.366655 0 0.195760 8.625509 False True 2.770926 0 0.177125 8.937469 False True 2.819291 0 0.127482 9.933473 False True 3.000718 1 0.807398 3.011533 False True 1.253392 1 0.737415 3.549180 False True 1.364033 1 0.800469 3.066391 False True 1.233293 2 0.697011 3.849676 False True 1.301674 2 0.123935 10.017078 False True 3.019967 2 0.040551 13.160880 True True 3.679935 2 0.054265 12.366968 False True 3.510781 2 0.108525 10.407018 False True 3.152509
In [20]:
# Outlier detection
fig, ax = model.biplot(SPE=True, HT2=True, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 with loadings.
In [21]:
# 3D plots
fig, ax = model.scatter3d(figsize=(10, 6))
fig, ax = model.biplot3d(figsize=(10, 6))
fig, ax = model.biplot3d(SPE=True, HT2=True, figsize=(10, 6))
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
In [22]:
fig, ax = model.biplot3d(SPE=True, HT2=True, visible=False, figsize=(10, 6))
# Set the figure again to True and show the figure.
fig.set_visible(True)
fig
[pca] >Plot PC1 vs PC2 vs PC3 with loadings.
Out[22]:
In [23]:
# Normalize out PCs
X_norm = model.norm(X)
[pca] >Cleaning previous fitted model results.. [pca] >Column labels are auto-completed. [pca] >Row labels are auto-completed. [pca] >The PCA reduction is performed on the [4] columns of the input dataframe. [pca] >Fit using PCA. [pca] >Compute loadings and PCs. [pca] >Compute explained variance. [pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[4] [pca] >Outlier detection using SPE/DmodX with n_std=[2]
In [24]:
print(X_norm)
[[2.49827932 2.78822199 2.73760805 2.64042916] [2.17418189 2.28972391 2.19907228 2.29205175] [2.37043166 2.24256734 2.25379844 2.32729489] [2.43101742 2.10641429 2.20334321 2.2064176 ] [2.62149842 2.76834241 2.78567422 2.65152651] [2.94051298 3.3602201 3.31931842 3.14971943] [2.67073929 2.31133341 2.41744878 2.48386043] [2.50867666 2.64467154 2.64574665 2.53260032] [2.30124683 1.7971212 1.89540188 2.02990525] [2.26048037 2.34694131 2.3476342 2.22636441] [2.58633591 3.1829288 3.08815008 2.85878304] [2.64229309 2.48124151 2.60195141 2.43586883] [2.17050087 2.18859607 2.17296045 2.14463246] [2.22850563 1.73933529 1.83573841 1.97457016] [2.51342988 3.69899386 3.40633499 3.22411108] [3.12251957 3.92933711 3.81764254 3.56603525] [2.73975929 3.33063051 3.15369372 3.2019132 ] [2.54367435 2.80393612 2.72111917 2.74600693] [2.68847085 3.535213 3.37294253 3.11672737] [2.83837804 3.0079359 3.03452596 2.8917751 ] [2.44219749 3.0011212 2.89896181 2.67386954] [2.80226798 2.9581159 2.92737021 2.944414 ] [2.58760075 2.39709795 2.4496467 2.53635417] [2.6220195 2.7264883 2.63102619 2.8121394 ] [2.79285836 2.5034337 2.72616994 2.3967235 ] [2.23284473 2.38993242 2.32448533 2.30779639] [2.64965514 2.68349719 2.65417506 2.73070742] [2.50675374 2.8810331 2.82161493 2.66922224] [2.37506023 2.80810158 2.68954188 2.62933181] [2.52099692 2.26475953 2.37801696 2.28814956] [2.39777783 2.28463912 2.32995079 2.27705221] [2.4326107 3.01775466 2.7831717 2.91112196] [2.95038924 3.25852376 3.38210504 2.8812777 ] [2.90195892 3.58861577 3.54267897 3.17836736] [2.3058754 2.36265544 2.33114532 2.33194218] [2.19510122 2.49141109 2.34019438 2.4658679 ] [2.28123488 3.12247946 2.86660469 2.8208437 ] [2.61781739 2.66721456 2.75956239 2.50410721] [2.33256349 1.85525793 1.94466258 2.09589257] [2.46696266 2.73008526 2.68834735 2.57444185] [2.53519993 2.711125 2.63711229 2.71721384] [1.76570889 1.49764686 1.3361063 1.87273976] [2.49557367 1.98632619 2.12599633 2.20177031] [2.82195028 2.78045957 2.71186418 2.99480183] [3.08452676 3.05323962 3.18366178 2.9451591 ] [2.26129092 2.22002432 2.13998269 2.35578799] [2.84317143 2.99961918 3.09242101 2.77314889] [2.46233408 2.16455102 2.25260391 2.27240492] [2.62804991 3.09751508 3.04554937 2.81694151] [2.37698315 2.57174001 2.5136736 2.49270989] [3.66215623 4.64716384 4.44355803 4.11293616] [3.85745844 4.13560087 4.08865258 3.99356167] [3.76813702 4.52672492 4.37661393 4.09763665] [3.15760652 2.70865502 2.71519127 2.99462477] [3.5399125 3.96627547 3.80999197 3.81059926] [3.73264607 3.24414009 3.46075792 3.27775997] [4.12644941 4.14623021 4.20304222 4.0841399 ] [3.0018916 2.16278269 2.30987734 2.57112028] [3.48891353 4.08579506 3.9762373 3.69422412] [3.6039755 2.72286712 2.8921616 3.19948189] [2.73453409 2.00085468 2.0726229 2.37510943] [3.75245302 3.55527184 3.57009679 3.71762162] [2.73134633 3.02304709 2.88699456 2.83416021] [3.79306701 3.68183801 3.78815108 3.57754582] [3.40416936 3.15768392 3.13616851 3.40629329] [3.55522789 4.30319637 4.10087052 3.97361804] [4.02816031 3.32122289 3.5665132 3.55295172] [3.2724882 3.18728771 3.29653369 3.00212308] [3.12583556 3.30943214 3.09678245 3.3804899 ] [3.13792422 2.88631135 2.93069731 2.94423694] [4.35277882 3.77786686 3.95040095 4.06194202] [3.31484793 3.54880797 3.42412986 3.51036828] [3.52939052 3.62103784 3.57700847 3.52895427] [3.62077186 3.58487563 3.73046196 3.31345141] [3.42177628 3.89277544 3.76681737 3.6496864 ] [3.5154368 4.15224852 3.96760295 3.87883765] [3.46975231 4.22159729 4.03709531 3.80444918] [3.91103842 4.329189 4.20917406 4.15912183] [3.7797992 3.59734362 3.64624915 3.66737895] [2.93156658 2.99195548 2.91482906 2.98563334] [3.04794471 2.7279661 2.75602355 2.86250498] [2.95236126 2.70485458 2.73110626 2.76997566] [3.26290141 3.20392116 3.18074358 3.2393755 ] [3.96331458 3.52637387 3.69686357 3.58878832] [4.11158832 3.15039545 3.48131179 3.46926867] [4.23271967 3.9407284 4.08309463 4.03765108] [3.75118818 4.34110269 4.20860017 4.04005049] [3.02464817 3.42155434 3.22162161 3.2771632 ] [3.73661656 3.26020504 3.43386626 3.39398995] [3.3206167 2.83972328 2.89652502 3.10050251] [3.55748045 2.91913288 3.16930547 2.99566985] [3.82438367 3.73997474 3.83741178 3.64353314] [3.23158475 3.14578443 3.13148288 3.17338819] [2.87867251 2.18266228 2.26181117 2.56002293] [3.54228972 3.07100005 3.20327182 3.22212489] [3.69969595 3.33730203 3.53436202 3.31720527] [3.66358589 3.28748203 3.42720627 3.36984416] [3.50520429 3.72194801 3.68161596 3.56600335] [2.89479844 2.39266618 2.34503821 2.85246529] [3.53189238 3.2145505 3.29513322 3.32995373] [5.18745415 4.38382351 4.59292258 4.86471007] [4.18292767 3.40268882 3.56219553 3.82183858] [4.37745831 4.80027695 4.68627694 4.63136302] [4.34291491 3.98209852 4.18005255 3.96610397] [4.62294894 4.29611138 4.37277766 4.49894008] [4.52020724 5.27912732 5.18912368 4.74923156] [4.00342295 2.42708447 2.78199616 3.20652223] [4.27709381 4.88801747 4.89590281 4.29318013] [3.95041538 4.07641166 4.07060022 3.8956177 ] [5.10673179 5.35654674 5.28973571 5.38705197] [4.34385011 4.3439696 4.29724594 4.48500138] [4.03302049 3.92996591 3.9006121 4.04679085] [4.30184663 4.51444621 4.39285013 4.55803222] [4.05683811 3.19452357 3.28036603 3.79274552] [4.4914079 3.54679357 3.57041801 4.4026663 ] [4.62212604 4.32049306 4.28799095 4.73379628] [4.29080357 4.21106269 4.31451465 4.11577433] [5.22611739 5.9119256 5.98197666 5.30711338] [4.3938282 5.15602496 4.96029766 4.75132781] [3.46020569 3.1755917 3.21861191 3.23156464] [4.6143097 4.77715123 4.66661917 4.89081014] [4.29287896 3.29831484 3.46835976 3.92276906] [4.32027645 5.22515604 5.10828569 4.56656913] [3.82858578 3.79924848 3.70887558 3.95156532] [4.68845275 4.64042968 4.70506239 4.64891042] [4.41275782 4.97701395 5.0010842 4.44930054] [3.90161645 3.77197149 3.71553557 3.97571111] [4.15652905 3.82502343 3.89567479 4.02669889] [4.3558809 4.04912048 4.08251974 4.27173993] [4.05858074 4.79972264 4.76991586 4.15836415] [4.0988929 4.90881638 4.74853541 4.41375743] [4.90133406 6.02913258 5.9759373 5.21878622] [4.40127593 4.06483461 4.06603086 4.3773177 ] [3.87428263 3.83243503 3.93182143 3.661674 ] [4.00024754 3.5518122 3.88880603 3.30129323] [4.31834117 5.35898229 4.99171575 5.06747083] [5.02281052 4.40405392 4.53445364 4.86426494] [4.41402266 4.1911831 4.36258082 4.12687168] [4.14805463 3.73221232 3.81166791 3.9979058 ] [4.29144929 4.65799666 4.48471153 4.66586106] [4.61143923 4.5491064 4.43285583 4.87281443] [4.23167408 4.66723271 4.32751525 4.91616192] [4.18292767 3.40268882 3.56219553 3.82183858] [4.75640055 4.70653231 4.70683081 4.82287173] [4.87003286 4.70328618 4.63910688 5.0712215 ] [4.28378543 4.43826855 4.19305314 4.76649156] [3.76115905 3.69129174 3.55245913 3.93821691] [4.23102835 4.22029874 4.15731837 4.3660752 ] [4.91875265 4.28813129 4.42552946 4.74294253] [4.34033391 3.6689908 3.89328573 3.91691895]]