Multivariate Parameter Fitting
''''''''''''''''''''''''''''''''

The ``distfit`` library provides multivariate distribution fitting that enables modeling **complex dependencies between multiple variables** using **copula-based methods**.
Rather than assuming a single multivariate parametric distribution, ``distfit`` decomposes the problem into:

* Univariate marginal distribution fitting
* Dependence modeling via a Gaussian copula

This separation allows flexible modeling of heterogeneous marginals while still capturing multivariate structure.

Core Features
==============

* Multivariate distribution fitting with automatic marginal estimation
* Gaussian copula–based dependence modeling
* Joint density evaluation for relative likelihood comparison
* Multivariate outlier detection using joint log-density
* Synthetic data generation preserving marginals and dependence
* Extensive visualization tools for copula diagnostics


Marginal Distribution Fitting
====================================

Each variable is fitted independently using univariate distributions. You need to set ``multivariate=True`` and you can also set all other parameters as desired.

.. code-block:: python

   dfit = distfit(
       multivariate=True,
       distr='norm',
       method='mle',
       bins=50,
       alpha=0.05
   )

Copula Dependence Modeling
====================================

Dependence is modeled using a Gaussian copula, where :math:`\Sigma` is the estimated correlation matrix.

.. math::

   C(u_1, \dots, u_d) =
   \Phi_\Sigma\left(\Phi^{-1}(u_1), \dots, \Phi^{-1}(u_d)\right)

Joint Density Evaluation
====================================

The joint density is computed as:

.. math::

   f(\mathbf{x}) =
   c(\mathbf{u}) \prod_{i=1}^{d} f_i(x_i)

with copula density:

.. math::

   c(\mathbf{u}) =
   \frac{\phi_\Sigma(\mathbf{z})}
        {\prod_{i=1}^{d} \phi(z_i)},
   \quad z_i = \Phi^{-1}(u_i)


Quick Example for Multivariate Fitting
==========================================

.. code-block:: python

   from distfit import distfit

   # Initialize with multivariate mode
   dfit = distfit(multivariate=True)

   # Load example data
   X = dfit.import_example(data='multi_normal')
   # X = dfit.import_example(data='multi_t')

   # Fit model
   dfit.fit_transform(X)

   # Access estimated correlation matrix (Gaussian copula)
   print(dfit.model.corr)

   # Evaluate joint density
   results = dfit.evaluate_pdf(X)
   print(results['score'])
   print(results['copula_density'])

   # Generate synthetic samples
   Xnew = dfit.generate(n=10)

   # Detect multivariate outliers
   bool_outliers = dfit.predict_outliers(X)


Interpretation output
==========================================

.. code-block:: python

   results = dfit.evaluate_pdf(X)

   # Output
   results['copula_density']
   results['score']

* ``copula_density``
  Vector of joint density values, one per observation. These are **relative likelihoods**, not probabilities.

* ``score``
  Mean log joint density, where higher values indicate a better model fit when comparing models on the same data.

  .. math::

     \text{score} = \frac{1}{n} \sum_{i=1}^{n} \log f(\mathbf{x}_i)


Plots
'''''''''''''''''''''''''''''

Copula Gaussian Density
==================================

This visualization shows the data transformed to **Gaussian copula space**, where :math:`F_i` are fitted marginal CDFs and :math:`\Phi^{-1}` is the inverse standard normal CDF.

.. math::

   U_i = F_i(X_i), \quad Z_i = \Phi^{-1}(U_i)

**Interpretation**
    * Each point represents an observation in latent Gaussian space
    * Elliptical contours indicate linear dependence
    * Structure reflects dependence only, not marginal shape


.. code-block:: python

   fig, ax = dfit.plot_copulaDensity(plot_type='gaussian', pairplot=False)

.. figure:: ../figs/copulaDensity_gaussian.png
   :scale: 80%


Copula Gaussian Density Pairplot
==================================

**Interpretation**
    * Diagonal panels show marginal distributions in Gaussian space
    * Off-diagonal panels show pairwise dependence
    * Linear structure indicates strong dependence
    * Circular scatter indicates weak or no dependence

.. code-block:: python

   fig, ax = dfit.plot_copulaDensity(plot_type='gaussian', pairplot=True)

.. figure:: ../figs/copulaDensity_gaussian_pairplot.png
   :scale: 80%


Copula Uniform Density
==================================

This visualization shows the data in **copula (uniform) space**.

.. math::

   U_i = F_i(X_i)

**Interpretation**
    * All marginals are uniform on :math:`[0,1]`
    * Structure reflects dependence only
    * Uniform scatter implies independence
    * Clustering near corners suggests tail dependence

.. code-block:: python

   fig, ax = dfit.plot_copulaDensity(plot_type='uniform', pairplot=False)

.. figure:: ../figs/copulaDensity_uniform.png
   :scale: 80%

.. figure:: ../figs/copulaDensity_uniformB.png
   :scale: 80%


Copula Uniform Density Pairplot
==================================

**Interpretation**
    * Diagonal panels test PIT uniformity
    * Off-diagonal panels show empirical copula structure
    * Deviations indicate marginal misfit or dependence

.. code-block:: python

   fig, ax = dfit.plot_copulaDensity(plot_type='uniform', pairplot=True)

.. figure:: ../figs/copulaDensity_uniform_pair.png
   :scale: 80%


Joint Density Plot
==================================

**Interpretation**
    * Displays bivariate slices of the joint density
    * Combines marginal distributions and dependence
    * Higher dimensions are visualized via pairwise projections

.. code-block:: python

   fig, ax = dfit.plot_jointDensity(X)

.. figure:: ../figs/jointDensity.png
   :scale: 80%


PDF Plot
==================================

**Interpretation**
    * Shows fitted marginal probability density functions
    * Used to assess marginal distribution fit

.. code-block:: python

   fig, ax = dfit.plot(chart='pdf')

.. figure:: ../figs/multi_PDF.png
   :scale: 80%


CDF Plot
==================================

**Interpretation**
    * Shows fitted marginal cumulative distribution functions
    * Used to validate probability integral transforms

.. code-block:: python

   fig, ax = dfit.plot(chart='cdf')

.. figure:: ../figs/multi_CDF.png
   :scale: 80%


QQ Plot (Multivariate)
==================================

**Interpretation**
    * Compares empirical quantiles to fitted marginals
    * Large deviations indicate poor marginal fit
    * Multivariate outliers often appear at extremes

.. code-block:: python

   fig, ax = dfit.qqplot(X)

.. figure:: ../figs/multi_QQ.png
   :scale: 80%


Outlier Detection
''''''''''''''''''''''''''''''''

Outliers are defined as observations with low joint log-density.
This detects observations unlikely under the **full multivariate model**, even if they are not marginal outliers.

.. code-block:: python

   outliers = dfit.predict_outliers(X)

It is expected that outliers have lower likelihood. We can expect that as shown in the code-block.

.. code-block:: python

    rng = np.random.default_rng(42)
    mean = [0, 0]
    cov = [[1, 0.6],
           [0.6, 1]]

    X = rng.multivariate_normal(mean, cov, size=2000)

    # Fit model on multivariate normal random data
    dfit = distfit(multivariate=True, verbose=False)
    dfit.fit_transform(X)

    # Evaluate the copula density
    pdf = dfit.evaluate_pdf(X)["copula_density"]

    # Get outliers
    outliers = dfit.predict_outliers(X)

    # Outliers have lower likelihood
    print(np.mean(pdf[outliers]))
    # 0.0014758104978686533

    print(np.mean(pdf[~outliers]))
    # 0.10025029900211244

    print(np.mean(pdf[outliers]) < np.mean(pdf[~outliers]))
    # True


Generate Synthetic Data
''''''''''''''''''''''''''''''''

Generate multivariate synthetic data based on the multidistribution fit.

.. code-block:: python

   # Generate synthetic samples
   Xnew = dfit.generate(n=10)

    array([[ 0.61334212,  0.55326009,  0.15892912, -0.08668606],
        [ 1.12584863,  1.14758074,  0.18494332, -0.80220606],
        [ 3.72283115,  0.62819404,  0.31963464, -0.13226541],
        [ 1.05816854,  0.52648982,  0.30748156, -0.10778112],
        [ 0.48590115,  0.5370091 ,  0.31400217,  0.08802375],
        [ 0.51329513,  0.34469918,  0.12943172,  0.74397221],
        [ 1.3917044 ,  1.17482342,  0.30421591, -0.09497158],
        [ 0.42975052,  0.6232065 ,  0.25283493, -0.31761824],
        [ 0.27751107,  0.5779773 ,  0.35859482,  1.66407101],
        [ 1.13505836,  0.41056057,  0.24425488, -0.18984279]])


Model Comparison
''''''''''''''''''''''''''''''''

Use the mean log-density score for comparison.
Higher scores indicate better fit (for the same data).

.. code-block:: python

   res1 = model1.evaluate_pdf(X)
   res2 = model2.evaluate_pdf(X)

   print(res1['score'], res2['score'])


Connected variables
''''''''''''''''''''''''''''''''

In a Gaussian copula model, all dependencies between variables are encoded in the
**correlation matrix** stored in ``dfit.model.corr``. Each entry
``corr[i, j]`` represents the linear dependence between variable *i* and *j* in
Gaussian copula space.

This correlation matrix induces a graph structure where:
    - **Nodes** correspond to variables (columns of ``X``)
    - **Edges** exist when two variables have a non-zero (or sufficiently large) correlation

By analysing this graph, we can identify **connected components**: groups of variables
that are mutually dependent (directly or indirectly). Variables belonging to different
components are statistically independent under the copula model.

Identifying connected variables helps to:
    - Interpret the dependency structure learned by the model
    - Detect independent sub-copulas in high-dimensional data
    - Explain block-diagonal or near block-diagonal correlation matrices
    - Simplify diagnostics and model validation


The example below extracts connected components directly from
``dfit.model.corr`` using a depth-first search (DFS). A small threshold can be used to avoid spurious connections caused by numerical noise.

.. code-block:: python

    print(dfit.model.corr)

    [[1.         0.57622997]
    [0.57622997 1.        ]]

    # Connected variables for the first variable
    dfit.model.corr[:, 0] > 0.8


Caveats and Considerations
''''''''''''''''''''''''''''''''

* Gaussian copula assumes elliptical dependence
* Tail dependence may be underestimated
* Computational cost increases with dimensionality
* Density values are relative likelihoods, not probabilities
* Covariance regularization is applied for numerical stability


References
=============

The Gaussian copula relies on the multivariate normal distribution [#mvn1]_ [#mvn2]_.

.. [#mvn1] Multivariate normal distribution,
          https://en.wikipedia.org/wiki/Multivariate_normal_distribution

.. [#mvn2] Estimate a multivariate distribution,
          https://openturns.github.io/openturns/latest/auto_data_analysis/distribution_fitting/plot_estimate_multivariate_distribution.html


.. include:: add_bottom.add