Modelling Continuous Datasets

Learning Bayesian Networks from continuous data is a challanging task. There are different manner on how to work with continuous and/or hybrid datasets. Each manner has their own advantages and disadvantages.

In bnlearn the following options are available to work with continuous datasets:

    1. Discretize continuous datasets manually using domain knowledge.

    1. Discretize continuous datasets using a probability density fitting.

    1. Discretize continuous datasets using a principled Bayesian discretization method.

    1. Model continuous and hybrid datasets in a semi-parametric approach that assumes a linear relationships.

LiNGAM-based Methods

Bnlearn includes LiNGAM-based methods which do the estimation of Linear, Non-Gaussian Acyclic Model from observed data. It assumes non-Gaussianity of the noise terms in the causal model. Various methods are developed and published for which Bnlearn includes two methods: ICA-based LiNGAM [1], DirectLiNGAM [2]. The following three are not included VAR-LiNGAM [3], RCD [4], and CAM-UV [5].

To demonstrate how the LiNGAM works, it is best to do it with a small toy example. Let’s create test data containing six variables.

The goal of this dataset is to demonstrate the contribution of different variables and their causal impact on other variables. All variables must be consistent, as in any other dataset. The sample size is set to n=1000 with a uniform distribution. If the number of samples is much smaller, say in the tens, the method becomes less reliable due to insufficient information to determine causality.

We will establish dependencies between variables and then allow the model to infer the original values.

  1. Step 1: x3 is the root node and is initialized with a uniform distribution.

  2. Step 2: x0 and x2 are created by multiplying with the values of x3, making them dependent on x3.

  3. Step 3: x5 is created by multiplying with the values of x0, making it dependent on x0.

  4. Step 4: x1 and x4 are created by multiplying with the values of x0, making them dependent on x0.

fig8a

import numpy as np
import pandas as pd
from lingam.utils import make_dot

# Number of samples
n=1000

# step 1
x3 = np.random.uniform(size=n)
# step 2
x0 = 3.0*x3 + np.random.uniform(size=n)
x2 = 6.0*x3 + np.random.uniform(size=n)
# step 3
x5 = 4.0*x0 + np.random.uniform(size=n)
# step 4
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=n)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=n)
df = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
df.head()

m = np.array([[0.0, 0.0, 0.0, 3.0, 0.0, 0.0],
         [3.0, 0.0, 2.0, 0.0, 0.0, 0.0],
         [0.0, 0.0, 0.0, 6.0, 0.0, 0.0],
         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
         [8.0, 0.0,-1.0, 0.0, 0.0, 0.0],
         [4.0, 0.0, 0.0, 0.0, 0.0, 0.0]])
dot = make_dot(m)
dot

Structure learning can be applied with the direct-lingam method for fitting.

model = bn.structure_learning.fit(df, methodtype='direct-lingam')

# When we no look at the output, we can see that the dependency values are very well recovered for the various variables.

print(model['adjmat'])
# target        x0        x1       x2   x3        x4       x5
# source
# x0      0.000000  2.987320  0.00000  0.0  8.057757  3.99624
# x1      0.000000  0.000000  0.00000  0.0  0.000000  0.00000
# x2      0.000000  2.010043  0.00000  0.0 -0.915306  0.00000
# x3      2.971198  0.000000  5.98564  0.0 -0.704964  0.00000
# x4      0.000000  0.000000  0.00000  0.0  0.000000  0.00000
# x5      0.000000  0.000000  0.00000  0.0  0.000000  0.00000
bn.plot(model)

# Compute edge strength with the chi_square test statistic
model = bn.independence_test(model, df, prune=False)

print(model['adjmat'])
# target        x0        x1       x2   x3        x4       x5
# source
# x0      0.000000  2.987320  0.00000  0.0  8.057757  3.99624
# x1      0.000000  0.000000  0.00000  0.0  0.000000  0.00000
# x2      0.000000  2.010043  0.00000  0.0 -0.915306  0.00000
# x3      2.971198  0.000000  5.98564  0.0 -0.704964  0.00000
# x4      0.000000  0.000000  0.00000  0.0  0.000000  0.00000
# x5      0.000000  0.000000  0.00000  0.0  0.000000  0.00000

# Using the causal_order_ properties, we can see the causal ordering as a result of the causal discovery.

print(model['causal_order'])
# ['x3', 'x0', 'x5', 'x2', 'x1', 'x4']

# We can draw a causal graph by utility funciton.
bn.plot(model)

fig7a

This example nicely demonstrates that we can capture the dependencies with the causal factors acurately.

Direct-LiNGAM method

The Direct-LiNGAM method ‘direct-lingam’ is a semi-parametric approach that assumes a linear relationship among observed variables while ensuring that the error terms follow a non-Gaussian distribution, with the constraint that the graph remains acyclic. This method involves repeated regression analysis and independence assessments using linear regression with least squares. In each regression, one variable serves as the dependent variable (outcome), while the other acts as the independent variable (predictor). This process is applied to each type of variable. When regression analysis is conducted in the correct causal order, the independent variables and error terms will exhibit independence. Conversely, if the regression is performed under an incorrect causal order, the independence of the explanatory variables and error terms is disrupted. By leveraging the dependency properties (where both residuals and explanatory variables share common error terms), it becomes possible to infer the causal order among the variables. Furthermore, for a given observed variable, any explanatory variable that remains independent of the residuals, regardless of the other variables considered, can be inferred as the first in the causal hierarchy.

Or in other words, the lingam-direct method allows you to model continuous and mixed datasets. A disadvantage is that causal discovery of structure learning is the end-point when uing this method. It is not possible to perform parameter learning and inferences.

# Import
import bnlearn as bn

# Load data set
df = bn.import_example(data='auto_mpg')
del df['origin']

# Structure learning
model = bn.structure_learning.fit(df, methodtype='direct-lingam', params_lingam = {'random_state': 2})

# Compute edge strength
model = bn.independence_test(model, df, prune=True)

# Plot
bn.plot(model)

# Plot with graphviz
dotgraph = bn.plot_graphviz(model)
dotgraph
dotgraph.view(filename=r'dotgraph_auto_mpg_lingam_direct')

Using the LINGAM method, the values on the edges describe the dependency using a multiplication factor of one variable to another. As an example, Origin -> -10 -> Displacement tells us Displacement has values that are factor -10 lower than origin.

fig4a

fig4b

ICA-LiNGAM method

The ICA-LiNGAM method ‘ica-lingam’ is also from lingam and follows the same procedure.

# Import
import bnlearn as bn

# Load data set
df = bn.import_example(data='auto_mpg')
del df['origin']


# Structure learning
model = bn.structure_learning.fit(df, methodtype='ica-lingam')

# Compute edge strength
model = bn.independence_test(model, df, prune=True)

# Plot
bn.plot(model)

# Plot with graphviz
dotgraph = bn.plot_graphviz(model)
dotgraph
dotgraph.view(filename=r'dotgraph_auto_mpg_lingam_ica')

fig6a

fig6b

PC method

A different, but quite straightforward approach to build a DAG from data is identifing independencies in the data set using hypothesis tests and then construct DAG (pattern) according to identified independencies (Conditional) Independence Tests. Independencies in the data can be identified using chi2 conditional independence tests.

The Constraint-Based PC Algorithm (named after Peter and Clark, its inventors) is a popular method in causal inference and Bayesian network learning. It is a type of constraint-based algorithm, which uses conditional independence tests to build a causal graph from data. This algorithm is widely used to learn the structure of Bayesian networks and causal graphs by identifying relationships between variables.

DAG (pattern) construction

With a method for independence testing at hand, we can construct a DAG from the data set in three steps:
    1. Construct an undirected skeleton.

    1. Orient compelled edges to obtain partially directed acyclid graph.

    1. Extend DAG pattern to a DAG by conservatively orienting the remaining edges in some way.

# Import
import bnlearn as bn

# Load data set
df = bn.import_example(data='auto_mpg')

# Structure learning
model = bn.structure_learning.fit(df, methodtype='pc')

# Compute edge strength
model = bn.independence_test(model, df)

# Plot
bn.plot(model, edge_labels='pvalue')

# Plot with graphviz
dotgraph = bn.plot_graphviz(model, edge_labels='pvalue')
dotgraph
dotgraph.view(filename=r'dotgraph_auto_mpg_PC')

PC PDAG construction is only guaranteed to work under the assumption that the identified set of independencies is faithful, i.e. there exists a DAG that exactly corresponds to it. Spurious dependencies in the data set can cause the reported independencies to violate faithfulness. It can happen that the estimated PDAG does not have any faithful completions (i.e. edge orientations that do not introduce new v-structures). In that case a warning is issued.

fig5a

fig5b