API References

pca: A Python Package for Principal Component Analysis.

pca.pca.convert_verbose_to_old(verbose)

Convert new verbosity to the old ones.

pca.pca.hotellingsT2(X, alpha=0.05, df=1, n_components=5, multipletests='fdr_bh', param=None, verbose=3)

Test for outlier using hotelling T2 test.

Test for outliers using chi-square tests for each of the n_components. The resulting P-value matrix is then combined using fishers method per sample. The results can be used to priortize outliers as those samples that are an outlier across multiple dimensions will be more significant then others.

Parameters:
  • X (numpy-array.) – Principal Components.

  • alpha (float, (default: 0.05)) – Alpha level threshold to determine outliers.

  • df (int, (default: 1)) – Degrees of freedom.

  • n_components (int, (default: 5)) – Number of PC components to be used to compute the Pvalue.

  • multipletests (str, default: 'fdr_bh') –

    Multiple testing method. Options are:
    • None : No multiple testing

    • ’bonferroni’ : one-step correction

    • ’sidak’ : one-step correction

    • ’holm-sidak’ : step down method using Sidak adjustments

    • ’holm’ : step-down method using Bonferroni adjustments

    • ’simes-hochberg’ : step-up method (independent)

    • ’hommel’ : closed method based on Simes tests (non-negative)

    • ’fdr_bh’ : Benjamini/Hochberg (non-negative)

    • ’fdr_by’ : Benjamini/Yekutieli (negative)

    • ’fdr_tsbh’ : two stage fdr correction (non-negative)

    • ’fdr_tsbky’ : two stage fdr correction (non-negative)

  • param (2-element tuple (default: None)) – Pre-computed mean and variance in the past run. None to compute from scratch with X.

  • Verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Returns:

  • outliers (pd.DataFrame) – dataframe containing probability, test-statistics and boolean value.

  • y_bools (array-like) – boolean value when significant per PC.

  • param (2-element tuple) – computed mean and variance from X.

pca.pca.import_example(data='iris', url=None, sep=',', verbose=3)

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters:
  • data (str) –

    Name of datasets
    • ’iris’

    • ’sprinkler’

    • ’titanic’

    • ’student’

    • ’fifa’

    • ’cancer’

    • ’waterpump’

    • ’retail’

  • url (str) –

    url link to to dataset. verbose : int, (default: 20)

    Print progress to screen. The default is 3. 60: None, 40: Error, 30: Warn, 20: Info, 10: Debug

Returns:

Dataset containing mixed features.

Return type:

pd.DataFrame()

pca.pca.multitest_correction(Praw, multipletests='fdr_bh', verbose=3)

Multiple test correction for input pvalues.

Parameters:
  • Praw (list of float) – Pvalues.

  • method (str, default: 'fdr_bh') –

    Multiple testing method. Options are:
    • None : No multiple testing

    • ’bonferroni’ : one-step correction

    • ’sidak’ : one-step correction

    • ’holm-sidak’ : step down method using Sidak adjustments

    • ’holm’ : step-down method using Bonferroni adjustments

    • ’simes-hochberg’ : step-up method (independent)

    • ’hommel’ : closed method based on Simes tests (non-negative)

    • ’fdr_bh’ : Benjamini/Hochberg (non-negative)

    • ’fdr_by’ : Benjamini/Yekutieli (negative)

    • ’fdr_tsbh’ : two stage fdr correction (non-negative)

    • ’fdr_tsbky’ : two stage fdr correction (non-negative)

Returns:

Corrected pvalues.

Return type:

list of float.

pca.pca.normalize_size(getsizes, minscale: int | float = 0.5, maxscale: int | float = 4, scaler: str = 'zscore')

Normalize values between minimum and maximum value.

Parameters:
  • getsizes (input array) – Array of values that needs to be scaled.

  • minscale (Union[int, float], optional) – Minimum value. The default is 0.5.

  • maxscale (Union[int, float], optional) – Maximum value. The default is 4.

  • scaler (str, optional) –

    Type of scaler. The default is ‘zscore’.
    • ’zscore’

    • ’minmax’

Returns:

getsizes – scaled values between min-max.

Return type:

array-like

class pca.pca.pca(n_components=0.95, n_feat=25, method='pca', alpha=0.05, multipletests='fdr_bh', n_std=3, onehot=False, normalize=False, detect_outliers=['ht2', 'spe'], random_state=None, verbose='info')

pca module.

Parameters:
  • n_components ([0..1] or [1..number of samples-1], (default: 0.95)) – Number of PCs to be returned. When n_components is set >=1, the specified number of PCs is returned. When n_components is set between [0..1], the number of PCs is returned that covers at least this percentage of variance. n_components=None : Return all PCs n_components=0.95 : Return the number of PCs that cover at least 95% of variance. n_components=3 : Return the top 3 PCs.

  • n_feat (int, default: 10) – Number of features that explain the space the most, dervied from the loadings. This parameter is used for vizualization purposes only.

  • method ('pca' (default)) – ‘pca’ : Principal Component Analysis. ‘sparse_pca’ : Sparse Principal Components Analysis. ‘trunc_svd’ : truncated SVD (aka LSA).

  • alpha (float, default: 0.05) – Alpha to set the threshold to determine the outliers based on on the Hoteling T2 test.

  • multipletests (str, default: 'fdr_bh') –

    Multiple testing method to correct for the Hoteling T2 test:
    • None : No multiple testing

    • ’bonferroni’ : one-step correction

    • ’sidak’ : one-step correction

    • ’holm-sidak’ : step down method using Sidak adjustments

    • ’holm’ : step-down method using Bonferroni adjustments

    • ’simes-hochberg’ : step-up method (independent)

    • ’hommel’ : closed method based on Simes tests (non-negative)

    • ’fdr_bh’ : Benjamini/Hochberg (non-negative)

    • ’fdr_by’ : Benjamini/Yekutieli (negative)

    • ’fdr_tsbh’ : two stage fdr correction (non-negative)

    • ’fdr_tsbky’ : two stage fdr correction (non-negative

  • n_std (int, default: 3) – Number of standard deviations to determine the outliers using SPE/DmodX method.

  • onehot ([Bool] optional, (default: False)) – Boolean: Set True if X is a sparse data set such as the output of a tfidf model. Many zeros and few numbers. Note this is different then a sparse matrix. In case of a sparse matrix, use method=’trunc_svd’.

  • normalize (bool (default : False)) – Normalize data, Z-score

  • detect_outliers (list (default : ['ht2','spe'])) – None: Do not compute outliers. ‘ht2’: compute outliers based on Hotelling T2. ‘spe’: compute outliers basedon SPE/DmodX method.

  • random_state (int optional) – Random state

  • Verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

References

biplot(labels=None, c=[0, 0.1, 0.4], s=150, marker='o', edgecolor='#000000', jitter=None, n_feat=None, PC=None, SPE=None, HT2=None, alpha=0.8, gradient=None, density=False, density_on_top=False, fontcolor=[0, 0, 0], fontsize=18, fontweight='normal', color_arrow=None, arrowdict={'alpha': None, 'color_strong': '#880808', 'color_text': None, 'color_weak': '#002a77', 'fontsize': None, 'scale_factor': None, 'weight': None}, cmap='tab20c', title=None, legend=None, figsize=(25, 15), visible=True, fig=None, ax=None, dpi=100, grid=True, y=None, label=None, verbose=None)

Create Biplot.

Plots the Principal components with the samples, and the best performing features. Per PC, The feature with absolute highest loading is gathered. This can result into features that are seen over multiple PCs, and some features may never be detected. For vizualization purposes we will keep only the unique feature-names and plot them with red arrows and green labels. The feature-names that were never discovered (described as weak) are colored yellow.

Parameters:
  • labels (array-like, default: None) – Label for each sample. The labeling is used for coloring the samples.

  • c (list/array of RGB colors for each sample.) –

    The marker colors. Possible values:
    • A scalar or sequence of n numbers to be mapped to colors using cmap and norm.

    • A 2D array in which the rows are RGB or RGBA.

    • A sequence of colors of length n.

    • A single color format string.

  • s (Int or list/array (default: 50)) – Size(s) of the scatter-points. [20, 10, 50, …]: In case of list: should be same size as the number of PCs -> .results[‘PC’] 50: all points get this size.

  • marker (list/array of strings (default: 'o').) –

    Marker for the samples.
    • ’x’ : All data points get this marker

    • (‘.’, ‘o’, ‘v’, ‘^’, ‘<’, ‘>’, ‘8’, ‘s’, ‘p’, ‘*’, ‘h’, ‘H’, ‘D’, ‘d’, ‘P’, ‘X’) : Specify per sample the marker type.

  • n_feat (int, default: 10) – Number of features that explain the space the most, dervied from the loadings. This parameter is used for vizualization purposes only.

  • jitter (float, default: None) – Add jitter to data points as random normal data. Values of 0.01 is usually good for one-hot data seperation.

  • PC (tupel, default: None) – Plot the selected Principal Components. Note that counting starts from 0. PC1=0, PC2=1, PC3=2, etc. None : Take automatically the first 2 components and 3 in case d3=True. [0, 1] : Define the PC for 2D. [0, 1, 2] : Define the PCs for 3D

  • SPE (Bool, default: False) –

    Show the outliers based on SPE/DmodX method.
    • None : Auto detect. If outliers are detected. it is set to True.

    • True : Show outliers

    • False : Do not show outliers

  • HT2 (Bool, default: False) –

    Show the outliers based on the hotelling T2 test.
    • None : Auto detect. If outliers are detected. it is set to True.

    • True : Show outliers

    • False : Do not show outliers

  • alpha (float or array-like of floats (default: 1).) – The alpha blending value ranges between 0 (transparent) and 1 (opaque). 1: All data points get this alpha [1, 0.8, 0.2, …]: Specify per sample the alpha

  • gradient (String, (default: None)) – Hex (ending) color for the gradient of the scatterplot colors. ‘#FFFFFF’

  • density (Bool (default: False)) – Include the kernel density in the scatter plot.

  • density_on_top (bool, (default: False)) – True : The density is the highest layer. False : The density is the lowest layer.

  • fontsize (String (default: 16)) – The fontsize of the labels.

  • fontcolor (list/array of RGB colors with same size as X (default : None)) – None : Use same colorscheme as for c ‘#000000’ : If the input is a single color, all fonts will get this color.

  • fontweight (String, (default: 'normal')) –

    The fontweight of the labels.
    • ’normal’

    • ’bold’

  • color_arrow (String, (default: None)) –

    color for the arrow.
    • None: Color arrows based on strength using ‘color_strong’ and ‘color_weak’.

    • ’#000000’

    • ’r’

  • arrowdict (dict.) –

    Dictionary containing properties for the arrow font-text. {‘fontsize’: None, ‘color_text’: None, ‘weight’: None, ‘alpha’: None, ‘color_strong’: ‘#880808’, ‘color_weak’: ‘#002a77’, ‘ha’: ‘center’, ‘va’: ‘center’}

    • fontsize: None automatically adjust based on the fontsize. Specify otherwise.

    • ’color_text’: None automatically adjust color based color_strong and color_weak. Specify hex color otherwise.

    • ’weight’: None automatically adjust based on fontweight. Specify otherwise: ‘normal’, ‘bold’

    • ’alpha’: None automatically adjust based on loading value.

    • ’color_strong’: Hex color for strong loadings (color_arrow needs to be set to None).

    • ’color_weak’: Hex color for weak loadings (color_arrow needs to be set to None).

    • ’scale_factor’: The scale factor for the arrow length. None automatically sets changes between 2d and 3d plots.

  • cmap (String, optional, default: 'Set1') – Colormap. If set to None, no points are shown.

  • title (str, default: None) – Title of the figure. None: Automatically create title text based on results. ‘’ : Remove all title text. ‘title text’ : Add custom title text.

  • legend (int, default: None) – None: Set automatically based on number of labels. False : No legend. True : Best position. 1 : ‘upper right’ 2 : ‘upper left’ 3 : ‘lower left’ 4 : ‘lower right’

  • figsize ((int, int), optional, default: (25, 15)) – (width, height) in inches.

  • visible (Bool, default: True) – Visible status of the Figure. When False, figure is created on the background.

  • fig (Figure, optional (default: None)) – Matplotlib figure.

  • ax (Axes, optional (default: None)) – Matplotlib Axes object

  • Verbose (int (default : 3)) – The higher the number, the more information is printed. Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Return type:

tuple containing (fig, ax)

References

biplot3d(PC=[0, 1, 2], alpha=0.8, figsize=(30, 25), **args)

Biplot 3d plot.

Parameters:

<scatter>. (Input parameters are described under)

compute_outliers(PC, n_std=3, verbose=3)

Compute outliers.

Parameters:
  • PC (Array-like) – Principal Components.

  • n_std (int, (default: 3)) – Standard deviation. The default is 3.

  • Verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Returns:

  • outliers (numpy array) – Array containing outliers.

  • outliers_params (dict, (default: None)) – Contains parameters for hotellingsT2() and spe_dmodx(), reusable in the future.

compute_topfeat(loadings=None, verbose=3)

Compute the top-scoring features.

The biplot show the loadings (arrows) together with the samples (scatterplot). The loadings can be colored red and blue which indicates the strength of the particular feature in the PC.

For each principal component (PC), the feature is determined with the largest absolute loading. This indicates which feature contributes the most to each PC and can occur in multiple PCs. The highest loading values for the features are colored red in the biplot and described as “best” in the output dataframe. The features that were not seen with highest loadings for any PC are considered weaker features, and are colored blue the biplot. In the output dataframe these features are described as “weak”.

Parameters:
  • loadings (array-like) – The array containing the loading information of the Principal Components.

  • Verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Returns:

topfeat – Best performing features per PC.

Return type:

pd.DataFrame

fit_transform(X, row_labels=None, col_labels=None, verbose=None)

Fit PCA on data.

Parameters:
  • X (array-like : Can be of type Numpy or DataFrame) – [NxM] array with columns as features and rows as samples.

  • row_labels ([list of integers or strings] optional) – Used for colors.

  • col_labels ([list of string] optional) – Numpy or list of strings: Name of the features that represent the data features and loadings. This should match the number of columns in the data. Use this option when using a numpy-array. For a pandas-dataframe, the column names are used but are overruled when using this parameter.

  • Verbose (int (default : 3)) – Set verbose during initialization.

Returns:

  • dict.

  • loadings (pd.DataFrame) – Structured dataframe containing loadings for PCs

  • X (array-like) – Reduced dimentionsality space, the Principal Components (PCs)

  • explained_var (array-like) – Explained variance for each fo the PCs (same ordering as the PCs)

  • model_pca (object) – Model to be used for further usage of the model.

  • topn (int) – Top n components

  • pcp (int) – pcp

  • col_labels (array-like) – Name of the features

  • y (array-like) – Determined class labels

Examples

>>> from pca import pca
>>> # Load example data
>>> from sklearn.datasets import load_iris
>>> X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
>>>
>>> Initialize
>>> model = pca(n_components=3)
>>> # Fit using PCA
>>> results = model.fit_transform(X)
>>>
>>> # Make plots
>>> fig, ax = model.scatter()
>>> fig, ax = model.plot()
>>> fig, ax = model.biplot()
>>> fig, ax = model.biplot(SPE=True, HT2=True)
>>>
>>> 3D plots
>>> fig, ax = model.scatter3d()
>>> fig, ax = model.biplot3d()
>>> fig, ax = model.biplot3d(SPE=True, HT2=True)
>>>
>>> # Normalize out PCs
>>> X_norm = model.norm(X)
import_example(data='iris', url=None, sep=',')

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters:
  • data (str) –

    Name of datasets
    • ’iris’

    • ’sprinkler’

    • ’titanic’

    • ’student’

    • ’fifa’

    • ’cancer’

    • ’waterpump’

    • ’retail’

  • url (str) – url link to to dataset.

Returns:

Dataset containing mixed features.

Return type:

pd.DataFrame()

norm(X, n_components=None, pcexclude=[1])

Normalize out PCs.

Normalize your data using the variance seen in hte Principal Components. This allows to remove (technical) variation in the data by normalizing out e.g., the 1st or 2nd etc component. This function transforms the original data using the PCs that you want to normalize out. As an example, if you aim to remove the variation seen in the 1st PC, the returned dataset will contain only the variance seen from the 2nd PC and more.

Parameters:
  • X (numpy array) – Data set.

  • n_components (int [0..1], optional) – Number of PCs that are returned for the plot. None: All PCs.

  • pcexclude (list of int, optional) – The PCs to exclude. The default is [1].

Return type:

Normalized numpy array.

plot(n_components=None, xsteps=None, title=None, visible=True, figsize=(15, 10), fig=None, ax=None, verbose=None)

Scree-plot together with explained variance.

Parameters:
  • n_components (int [0..1], optional) – Number of PCs that are returned for the plot. None: All PCs.

  • xsteps (int, optional) – Set the number of xticklabels.

  • title (str, default: None) – Title of the figure. None: Automatically create title text based on results. ‘’ : Remove all title text. ‘title text’ : Add custom title text.

  • visible (Bool, default: True) – Visible status of the Figure True : Figure is shown. False: Figure is created on the background.

  • figsize ((int, int): (default: 25, 15)) – (width, height) in inches.

  • fig (Figure, optional (default: None)) – Matplotlib figure.

  • ax (Axes, optional (default: None)) – Matplotlib Axes object

  • Verbose (int (default : 3)) – The higher the number, the more information is printed. Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Return type:

tuple containing (fig, ax)

scatter(labels=None, c=[0, 0.1, 0.4], s=150, marker='o', edgecolor='#000000', SPE=False, HT2=False, jitter=None, PC=None, alpha=0.8, gradient=None, density=False, density_on_top=False, fontcolor=[0, 0, 0], fontsize=18, fontweight='normal', cmap='tab20c', title=None, legend=None, figsize=(25, 15), dpi=100, visible=True, fig=None, ax=None, grid=True, y=None, label=None, verbose=3)

Scatter 2d plot.

Parameters:
  • labels (array-like, default: None) – Label for each sample. The labeling is used for coloring the samples.

  • c (list/array of RGB colors for each sample.) –

    The marker colors. Possible values:
    • A scalar or sequence of n numbers to be mapped to colors using cmap and norm.

    • A 2D array in which the rows are RGB or RGBA.

    • A sequence of colors of length n.

    • A single color format string.

  • s (Int or list/array (default: 50)) – Size(s) of the scatter-points. [20, 10, 50, …]: In case of list: should be same size as the number of PCs -> .results[‘PC’] 50: all points get this size.

  • marker (list/array of strings (default: 'o').) –

    Marker for the samples.
    • ’x’ : All data points get this marker

    • (‘.’, ‘o’, ‘v’, ‘^’, ‘<’, ‘>’, ‘8’, ‘s’, ‘p’, ‘*’, ‘h’, ‘H’, ‘D’, ‘d’, ‘P’, ‘X’) : Specify per sample the marker type.

  • jitter (float, default: None) – Add jitter to data points as random normal data. Values of 0.01 is usually good for one-hot data seperation.

  • PC (tupel, default: None) – Plot the selected Principal Components. Note that counting starts from 0. PC1=0, PC2=1, PC3=2, etc. None : Take automatically the first 2 components and 3 in case d3=True. [0, 1] : Define the PC for 2D. [0, 1, 2] : Define the PCs for 3D

  • SPE (Bool, default: False) –

    Show the outliers based on SPE/DmodX method.
    • None : Auto detect. If outliers are detected. it is set to True.

    • True : Show outliers

    • False : Do not show outliers

  • HT2 (Bool, default: False) –

    Show the outliers based on the hotelling T2 test.
    • None : Auto detect. If outliers are detected. it is set to True.

    • True : Show outliers

    • False : Do not show outliers

  • alpha (float or array-like of floats (default: 1).) – The alpha blending value ranges between 0 (transparent) and 1 (opaque). 1: All data points get this alpha [1, 0.8, 0.2, …]: Specify per sample the alpha

  • gradient (String, (default: None)) – Hex color to make a lineair gradient for the scatterplot. ‘#FFFFFF’

  • density (Bool (default: False)) – Include the kernel density in the scatter plot.

  • density_on_top (bool, (default: False)) – True : The density is the highest layer. False : The density is the lowest layer.

  • fontsize (String, optional) – The fontsize of the y labels that are plotted in the graph. The default is 16.

  • fontcolor (list/array of RGB colors with same size as X (default : None)) – None : Use same colorscheme as for c ‘#000000’ : If the input is a single color, all fonts will get this color.

  • cmap (String, optional, default: 'Set1') – Colormap. If set to None, no points are shown.

  • title (str, default: None) – Title of the figure. None: Automatically create title text based on results. ‘’ : Remove all title text. ‘title text’ : Add custom title text.

  • legend (int, default: None) – None: Set automatically based on number of labels. False : No legend. True : Best position. 1 : ‘upper right’ 2 : ‘upper left’ 3 : ‘lower left’ 4 : ‘lower right’

  • figsize ((int, int), optional, default: (25, 15)) – (width, height) in inches.

  • visible (Bool, default: True) – Visible status of the Figure. When False, figure is created on the background.

  • Verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Return type:

tuple containing (fig, ax)

scatter3d(PC=[0, 1, 2], **args)

Scatter 3d plot.

Parameters:

<scatter>. (Input parameters are described under)

transform(X, row_labels=None, col_labels=None, update_outlier_params=True, verbose=None)

Transform new input data with fitted model.

Parameters:
  • X (array-like : Can be of type Numpy or DataFrame) – [NxM] array with columns as features and rows as samples.

  • update_outlier_params (bool (default: True)) – True : Update the parameters for outlier detection so that the model learns from the new unseen input. This will cause that some initial outliers may not be an outlier anymore after a certain point. False: Do not update outlier parameters and outliers that were initially detected, will always stay an outlier.

  • row_labels ([list of integers or strings] optional) – Used for colors.

  • col_labels ([list of string] optional) – Numpy or list of strings: Name of the features that represent the data features and loadings. This should match the number of columns in the data. Use this option when using a numpy-array. For a pandas-dataframe, the column names are used but are overruled when using this parameter.

  • Verbose (int (default : 3)) – Set verbose during initialization.

Examples

>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>> from pca import pca
>>>
>>> # Initialize
>>> model = pca(n_components=2, normalize=True)
>>> # Dataset
>>> X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
>>>
>>> # Gather some random samples across the classes.
>>> idx=[0,1,2,3,4,50,51,52,53,54,55,100,101,102,103,104,105]
>>> X_unseen = X.iloc[idx, :]
>>>
>>> # Label the unseen samples differently.
>>> X.index.values[idx]=3
>>>
>>> # Fit transform
>>> model.fit_transform(X)
>>>
>>> # Transform the "unseen" data with the fitted model. Note that these datapoints are not really unseen as they are readily fitted above.
>>> # But for the sake of example, you can see that these samples will be transformed exactly on top of the orignial ones.
>>> PCnew = model.transform(X_unseen)
>>>
>>> # Plot PC space
>>> model.scatter()
>>> # Plot the new "unseen" samples on top of the existing space
>>> plt.scatter(PCnew.iloc[:, 0], PCnew.iloc[:, 1], marker='x')
Return type:

pca transformed data.

pca.pca.spe_dmodx(X, n_std=3, param=None, calpha=0.3, color='green', visible=False, verbose=3)

Compute SPE/distance to model (DmodX).

Outlier can be detected using SPE/DmodX (distance to model) based on the mean and covariance of the first 2 dimensions of X. On the model plane (SPE ≈ 0). Note that the SPE or Hotelling’s T2 are complementary to each other.

Parameters:
  • X (Array-like) – Input data, in this case the Principal components.

  • n_std (int, (default: 3)) – Standard deviation. The default is 2.

  • param (2-element tuple (default: None)) – Pre-computed g_ell_center and cov in the past run. None to compute from scratch with X.

  • calpha (float, (default: 0.3)) – transperancy color.

  • color (String, (default: 'green')) – Color of the ellipse.

  • visible (bool, (default: False)) – Scatter the points with the ellipse and mark the outliers.

Returns:

  • outliers (pd.DataFrame()) – column with boolean outliers and euclidean distance of each sample to the center of the ellipse.

  • ax (object) – Figure axis.

  • param (2-element tuple) – computed g_ell_center and cov from X.