Bnlearn for Python¶
Welcome to the notebook of bnlearn. bnlearn is Python package for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods. Because probabilistic graphical models can be difficult in usage, Bnlearn for python (this package) is build on the pgmpy package and contains the most-wanted pipelines. Navigate to API documentations for more detailed information.
The core functionalities are:
* Causal Discovery
* Structure Learning
* Parameter Learning
* Inferences using do-calculus
Read the Medium blogs.¶
Support¶
This library is for free but it runs on coffee! :)
You can support in various ways, have a look at the sponser page. Report bugs, issues and feature extensions at github page.
|
![]() |
# Package installeren
!pip install bnlearn
!pip install distfit
Collecting distfit Downloading distfit-1.8.6-py3-none-any.whl.metadata (13 kB) Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from distfit) (24.2) Requirement already satisfied: matplotlib>=3.5.2 in /usr/local/lib/python3.11/dist-packages (from distfit) (3.10.0) Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from distfit) (1.26.4) Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from distfit) (2.2.2) Requirement already satisfied: statsmodels in /usr/local/lib/python3.11/dist-packages (from distfit) (0.14.4) Requirement already satisfied: scipy in /usr/local/lib/python3.11/dist-packages (from distfit) (1.11.4) Requirement already satisfied: pypickle>=1.1.4 in /usr/local/lib/python3.11/dist-packages (from distfit) (1.1.5) Collecting colourmap>=1.1.10 (from distfit) Downloading colourmap-1.1.21-py3-none-any.whl.metadata (5.5 kB) Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from distfit) (1.4.2) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (1.3.2) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (4.57.0) Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (1.4.8) Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (11.2.1) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (3.2.3) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.5.2->distfit) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas->distfit) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas->distfit) (2025.2) Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.11/dist-packages (from statsmodels->distfit) (1.0.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib>=3.5.2->distfit) (1.17.0) Downloading distfit-1.8.6-py3-none-any.whl (40 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 3.1 MB/s eta 0:00:00 Downloading colourmap-1.1.21-py3-none-any.whl (10 kB) Installing collected packages: colourmap, distfit Successfully installed colourmap-1.1.21 distfit-1.8.6
# Importeer de package
from distfit import distfit
import bnlearn as bn
print(bn.__version__)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
0.11.1
# Import library
import bnlearn as bn
# Load dataset
df = bn.import_example('predictive_maintenance')
df.head()
UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Machine failure | TWF | HDF | PWF | OSF | RNF | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 |
# Remove IDs from Dataframe
del df['UDI']
del df['Product ID']
# Discretize the following columns
colnames = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
colors = ['#87CEEB', '#FFA500', '#800080', '#FF4500', '#A9A9A9']
fig, axes = plt.subplots(1, len(colnames), figsize=(5 * len(colnames), 4)) # adjust figsize as needed
# Apply distribution fitting to each variable
for idx, (colname, color) in enumerate(zip(colnames, colors)):
# Initialize and set 95% confidence interval
if colname=='Tool wear [min]' or colname=='Process temperature [K]':
# Set model parameters to determine the medium-high ranges
dist = distfit(alpha=0.05, bound='up', stats='RSS')
labels = ['medium', 'high']
else:
# Set model parameters to determine the low-medium-high ranges
dist = distfit(alpha=0.05, stats='RSS')
labels = ['low', 'medium', 'high']
# Distribution fitting
dist.fit_transform(df[colname])
# Plot
dist.plot(title=colname, bar_properties={'color': color}, ax=axes[idx], fontsize=8)
# Define bins based on distribution
bins = [df[colname].min(), dist.model['CII_min_alpha'], dist.model['CII_max_alpha'], df[colname].max()]
# Remove None
bins = [x for x in bins if x is not None]
# Discretize using the defined bins and add to dataframe
df[colname + '_category'] = pd.cut(df[colname], bins=bins, labels=labels, include_lowest=True)
# Delete the original column
del df[colname]
[06-05-2025 08:44:34] [distfit.distfit] fit [06-05-2025 08:44:34] [distfit.distfit] transform [06-05-2025 08:44:34] [distfit.distfit] [norm ] [0.00 sec] [RSS: 0.0511386] [loc=300.005 scale=2.000] [06-05-2025 08:44:34] [distfit.distfit] [expon ] [0.00 sec] [RSS: 0.271502] [loc=295.300 scale=4.705] [06-05-2025 08:44:34] [distfit.distfit] [pareto ] [0.01 sec] [RSS: 0.271502] [loc=-536870616.700 scale=536870912.000] [06-05-2025 08:44:34] [distfit.distfit] [dweibull ] [0.09 sec] [RSS: 0.040952] [loc=299.854 scale=1.875] [06-05-2025 08:44:35] [distfit.distfit] [t ] [1.28 sec] [RSS: 0.0511479] [loc=300.005 scale=2.000] [06-05-2025 08:44:36] [distfit.distfit] [genextreme] [0.34 sec] [RSS: 0.0476234] [loc=299.300 scale=1.960] [06-05-2025 08:44:36] [distfit.distfit] [gamma ] [0.19 sec] [RSS: 0.0497138] [loc=277.169 scale=0.175] [06-05-2025 08:44:36] [distfit.distfit] [lognorm ] [0.01 sec] [RSS: 0.0496087] [loc=274.349 scale=25.578] [06-05-2025 08:44:36] [distfit.distfit] [beta ] [0.16 sec] [RSS: 0.0388871] [loc=295.233 scale=9.468] [06-05-2025 08:44:36] [distfit.distfit] [uniform ] [0.00 sec] [RSS: 0.128775] [loc=295.300 scale=9.200] [06-05-2025 08:44:36] [distfit.distfit] [loggamma ] [0.13 sec] [RSS: 0.0510678] [loc=-180.683 scale=68.185] [06-05-2025 08:44:36] [distfit.distfit] [norm ] [2.28 sec] [RSS: 0.0511386] [loc=300.005 scale=2.000] [06-05-2025 08:44:36] [distfit.distfit] [expon ] [2.28 sec] [RSS: 0.271502] [loc=295.300 scale=4.705] [06-05-2025 08:44:36] [distfit.distfit] [pareto ] [2.28 sec] [RSS: 0.271502] [loc=-536870616.700 scale=536870912.000] [06-05-2025 08:44:36] [distfit.distfit] [dweibull ] [2.26 sec] [RSS: 0.040952] [loc=299.854 scale=1.875] [06-05-2025 08:44:36] [distfit.distfit] [t ] [2.16 sec] [RSS: 0.0511479] [loc=300.005 scale=2.000] [06-05-2025 08:44:36] [distfit.distfit] [genextreme] [0.87 sec] [RSS: 0.0476234] [loc=299.300 scale=1.960] [06-05-2025 08:44:36] [distfit.distfit] [gamma ] [0.52 sec] [RSS: 0.0497138] [loc=277.169 scale=0.175] [06-05-2025 08:44:36] [distfit.distfit] [lognorm ] [0.33 sec] [RSS: 0.0496087] [loc=274.349 scale=25.578] [06-05-2025 08:44:36] [distfit.distfit] [beta ] [0.31 sec] [RSS: 0.0388871] [loc=295.233 scale=9.468] [06-05-2025 08:44:36] [distfit.distfit] [uniform ] [0.15 sec] [RSS: 0.128775] [loc=295.300 scale=9.200] [06-05-2025 08:44:36] [distfit.distfit] [loggamma ] [0.14 sec] [RSS: 0.0510678] [loc=-180.683 scale=68.185] [06-05-2025 08:44:36] [distfit.distfit] Compute confidence intervals [parametric] [06-05-2025 08:44:36] [distfit.distfit] Create pdf plot for the parametric method. [06-05-2025 08:44:36] [distfit.distfit] Estimated distribution: Beta(loc:295.233025, scale:9.468206) [06-05-2025 08:44:36] [distfit.distfit] fit [06-05-2025 08:44:36] [distfit.distfit] transform [06-05-2025 08:44:36] [distfit.distfit] [norm ] [0.00 sec] [RSS: 0.063504] [loc=310.006 scale=1.484] [06-05-2025 08:44:36] [distfit.distfit] [expon ] [0.00 sec] [RSS: 0.59816] [loc=305.700 scale=4.306] [06-05-2025 08:44:36] [distfit.distfit] [pareto ] [0.00 sec] [RSS: 0.59816] [loc=-536870606.300 scale=536870912.000] [06-05-2025 08:44:37] [distfit.distfit] [dweibull ] [0.07 sec] [RSS: 0.046066] [loc=309.949 scale=1.357] [06-05-2025 08:44:37] [distfit.distfit] [t ] [0.80 sec] [RSS: 0.0635063] [loc=310.006 scale=1.484] [06-05-2025 08:44:38] [distfit.distfit] [genextreme] [0.19 sec] [RSS: 0.0592585] [loc=309.489 scale=1.482] [06-05-2025 08:44:38] [distfit.distfit] [gamma ] [0.14 sec] [RSS: 0.0635287] [loc=116.257 scale=0.011] [06-05-2025 08:44:38] [distfit.distfit] [lognorm ] [0.01 sec] [RSS: 0.0635331] [loc=104.035 scale=205.965] [06-05-2025 08:44:38] [distfit.distfit] [beta ] [0.08 sec] [RSS: 0.0555493] [loc=304.920 scale=9.868] [06-05-2025 08:44:38] [distfit.distfit] [uniform ] [0.00 sec] [RSS: 0.337828] [loc=305.700 scale=8.100] [06-05-2025 08:44:38] [distfit.distfit] [loggamma ] [0.14 sec] [RSS: 0.0631856] [loc=38.892 scale=40.908] [06-05-2025 08:44:38] [distfit.distfit] [norm ] [1.50 sec] [RSS: 0.063504] [loc=310.006 scale=1.484] [06-05-2025 08:44:38] [distfit.distfit] [expon ] [1.50 sec] [RSS: 0.59816] [loc=305.700 scale=4.306] [06-05-2025 08:44:38] [distfit.distfit] [pareto ] [1.50 sec] [RSS: 0.59816] [loc=-536870606.300 scale=536870912.000] [06-05-2025 08:44:38] [distfit.distfit] [dweibull ] [1.49 sec] [RSS: 0.046066] [loc=309.949 scale=1.357] [06-05-2025 08:44:38] [distfit.distfit] [t ] [1.42 sec] [RSS: 0.0635063] [loc=310.006 scale=1.484] [06-05-2025 08:44:38] [distfit.distfit] [genextreme] [0.61 sec] [RSS: 0.0592585] [loc=309.489 scale=1.482] [06-05-2025 08:44:38] [distfit.distfit] [gamma ] [0.41 sec] [RSS: 0.0635287] [loc=116.257 scale=0.011] [06-05-2025 08:44:38] [distfit.distfit] [lognorm ] [0.26 sec] [RSS: 0.0635331] [loc=104.035 scale=205.965] [06-05-2025 08:44:38] [distfit.distfit] [beta ] [0.25 sec] [RSS: 0.0555493] [loc=304.920 scale=9.868] [06-05-2025 08:44:38] [distfit.distfit] [uniform ] [0.16 sec] [RSS: 0.337828] [loc=305.700 scale=8.100] [06-05-2025 08:44:38] [distfit.distfit] [loggamma ] [0.16 sec] [RSS: 0.0631856] [loc=38.892 scale=40.908] [06-05-2025 08:44:38] [distfit.distfit] Compute confidence intervals [parametric] [06-05-2025 08:44:38] [distfit.distfit] Create pdf plot for the parametric method. [06-05-2025 08:44:38] [distfit.distfit] Estimated distribution: Dweibull(loc:309.949383, scale:1.356614) [06-05-2025 08:44:38] [distfit.distfit] fit [06-05-2025 08:44:38] [distfit.distfit] transform [06-05-2025 08:44:38] [distfit.distfit] [norm ] [0.00 sec] [RSS: 1.40233e-05] [loc=1538.776 scale=179.275] [06-05-2025 08:44:38] [distfit.distfit] [expon ] [0.00 sec] [RSS: 7.16337e-05] [loc=1168.000 scale=370.776] [06-05-2025 08:44:38] [distfit.distfit] [pareto ] [0.01 sec] [RSS: 7.16337e-05] [loc=-137438952304.000 scale=137438953472.000] [06-05-2025 08:44:38] [distfit.distfit] [dweibull ] [0.11 sec] [RSS: 6.3223e-06] [loc=1507.597 scale=127.459] [06-05-2025 08:44:39] [distfit.distfit] [t ] [0.78 sec] [RSS: 1.36147e-05] [loc=1544.191 scale=131.437] [06-05-2025 08:44:40] [distfit.distfit] [genextreme] [0.68 sec] [RSS: 2.51619e-05] [loc=1422.096 scale=166.433] [06-05-2025 08:44:40] [distfit.distfit] [gamma ] [0.14 sec] [RSS: 2.78093e-06] [loc=1164.708 scale=72.513] [06-05-2025 08:44:40] [distfit.distfit] [lognorm ] [0.00 sec] [RSS: 1.14108e-06] [loc=1121.830 scale=385.733] [06-05-2025 08:44:40] [distfit.distfit] [beta ] [0.32 sec] [RSS: 3.30363e-06] [loc=1166.155 scale=994828.239] [06-05-2025 08:44:40] [distfit.distfit] [uniform ] [0.00 sec] [RSS: 8.55307e-05] [loc=1168.000 scale=1718.000] [06-05-2025 08:44:40] [distfit.distfit] [loggamma ] [0.12 sec] [RSS: 1.52505e-05] [loc=-75353.181 scale=9721.493] [06-05-2025 08:44:40] [distfit.distfit] [norm ] [2.21 sec] [RSS: 1.40233e-05] [loc=1538.776 scale=179.275] [06-05-2025 08:44:40] [distfit.distfit] [expon ] [2.21 sec] [RSS: 7.16337e-05] [loc=1168.000 scale=370.776] [06-05-2025 08:44:40] [distfit.distfit] [pareto ] [2.21 sec] [RSS: 7.16337e-05] [loc=-137438952304.000 scale=137438953472.000] [06-05-2025 08:44:40] [distfit.distfit] [dweibull ] [2.19 sec] [RSS: 6.3223e-06] [loc=1507.597 scale=127.459] [06-05-2025 08:44:40] [distfit.distfit] [t ] [2.07 sec] [RSS: 1.36147e-05] [loc=1544.191 scale=131.437] [06-05-2025 08:44:40] [distfit.distfit] [genextreme] [1.29 sec] [RSS: 2.51619e-05] [loc=1422.096 scale=166.433] [06-05-2025 08:44:40] [distfit.distfit] [gamma ] [0.61 sec] [RSS: 2.78093e-06] [loc=1164.708 scale=72.513] [06-05-2025 08:44:40] [distfit.distfit] [lognorm ] [0.47 sec] [RSS: 1.14108e-06] [loc=1121.830 scale=385.733] [06-05-2025 08:44:40] [distfit.distfit] [beta ] [0.46 sec] [RSS: 3.30363e-06] [loc=1166.155 scale=994828.239] [06-05-2025 08:44:40] [distfit.distfit] [uniform ] [0.14 sec] [RSS: 8.55307e-05] [loc=1168.000 scale=1718.000] [06-05-2025 08:44:40] [distfit.distfit] [loggamma ] [0.13 sec] [RSS: 1.52505e-05] [loc=-75353.181 scale=9721.493] [06-05-2025 08:44:40] [distfit.distfit] Compute confidence intervals [parametric] [06-05-2025 08:44:40] [distfit.distfit] Create pdf plot for the parametric method. [06-05-2025 08:44:40] [distfit.distfit] Estimated distribution: Lognorm(loc:1121.830469, scale:385.732872) [06-05-2025 08:44:40] [distfit.distfit] fit [06-05-2025 08:44:40] [distfit.distfit] transform [06-05-2025 08:44:40] [distfit.distfit] [norm ] [0.00 sec] [RSS: 0.000108341] [loc=39.987 scale=9.968] [06-05-2025 08:44:40] [distfit.distfit] [expon ] [0.00 sec] [RSS: 0.0165948] [loc=3.800 scale=36.187] [06-05-2025 08:44:40] [distfit.distfit] [pareto ] [0.01 sec] [RSS: 0.0165948] [loc=-4294967292.200 scale=4294967296.000] [06-05-2025 08:44:40] [distfit.distfit] [dweibull ] [0.10 sec] [RSS: 0.000297995] [loc=40.257 scale=8.559] [06-05-2025 08:44:41] [distfit.distfit] [t ] [0.80 sec] [RSS: 0.000108341] [loc=39.987 scale=9.968] [06-05-2025 08:44:42] [distfit.distfit] [genextreme] [0.67 sec] [RSS: 0.0394017] [loc=75.766 scale=3.841] [06-05-2025 08:44:42] [distfit.distfit] [gamma ] [0.15 sec] [RSS: 0.000110117] [loc=-2092.085 scale=0.047] [06-05-2025 08:44:42] [distfit.distfit] [lognorm ] [0.00 sec] [RSS: 0.000108361] [loc=-262140.200 scale=262180.187] [06-05-2025 08:44:42] [distfit.distfit] [beta ] [0.25 sec] [RSS: 0.000106479] [loc=-184.901 scale=428.918] [06-05-2025 08:44:42] [distfit.distfit] [uniform ] [0.00 sec] [RSS: 0.0116663] [loc=3.800 scale=72.800] [06-05-2025 08:44:42] [distfit.distfit] [loggamma ] [0.12 sec] [RSS: 0.000103744] [loc=-1900.076 scale=288.365] [06-05-2025 08:44:42] [distfit.distfit] [norm ] [2.17 sec] [RSS: 0.000108341] [loc=39.987 scale=9.968] [06-05-2025 08:44:42] [distfit.distfit] [expon ] [2.17 sec] [RSS: 0.0165948] [loc=3.800 scale=36.187] [06-05-2025 08:44:42] [distfit.distfit] [pareto ] [2.16 sec] [RSS: 0.0165948] [loc=-4294967292.200 scale=4294967296.000] [06-05-2025 08:44:42] [distfit.distfit] [dweibull ] [2.15 sec] [RSS: 0.000297995] [loc=40.257 scale=8.559] [06-05-2025 08:44:42] [distfit.distfit] [t ] [2.04 sec] [RSS: 0.000108341] [loc=39.987 scale=9.968] [06-05-2025 08:44:42] [distfit.distfit] [genextreme] [1.24 sec] [RSS: 0.0394017] [loc=75.766 scale=3.841] [06-05-2025 08:44:42] [distfit.distfit] [gamma ] [0.56 sec] [RSS: 0.000110117] [loc=-2092.085 scale=0.047] [06-05-2025 08:44:42] [distfit.distfit] [lognorm ] [0.40 sec] [RSS: 0.000108361] [loc=-262140.200 scale=262180.187] [06-05-2025 08:44:42] [distfit.distfit] [beta ] [0.39 sec] [RSS: 0.000106479] [loc=-184.901 scale=428.918] [06-05-2025 08:44:42] [distfit.distfit] [uniform ] [0.14 sec] [RSS: 0.0116663] [loc=3.800 scale=72.800] [06-05-2025 08:44:42] [distfit.distfit] [loggamma ] [0.13 sec] [RSS: 0.000103744] [loc=-1900.076 scale=288.365] [06-05-2025 08:44:43] [distfit.distfit] Compute confidence intervals [parametric] [06-05-2025 08:44:43] [distfit.distfit] Create pdf plot for the parametric method. [06-05-2025 08:44:43] [distfit.distfit] Estimated distribution: Loggamma(loc:-1900.076093, scale:288.364818) [06-05-2025 08:44:43] [distfit.distfit] fit [06-05-2025 08:44:43] [distfit.distfit] transform [06-05-2025 08:44:43] [distfit.distfit] [norm ] [0.00 sec] [RSS: 5.71757e-05] [loc=107.951 scale=63.651] [06-05-2025 08:44:43] [distfit.distfit] [expon ] [0.00 sec] [RSS: 0.000106332] [loc=0.000 scale=107.951] [06-05-2025 08:44:43] [distfit.distfit] [pareto ] [0.01 sec] [RSS: 0.000106332] [loc=-17179869184.000 scale=17179869184.000] [06-05-2025 08:44:43] [distfit.distfit] [dweibull ] [0.12 sec] [RSS: 5.41882e-05] [loc=106.517 scale=60.733] [06-05-2025 08:44:44] [distfit.distfit] [t ] [0.81 sec] [RSS: 5.71685e-05] [loc=107.953 scale=63.654] [06-05-2025 08:44:44] [distfit.distfit] [genextreme] [0.68 sec] [RSS: 5.05908e-05] [loc=88.181 scale=64.880] [06-05-2025 08:44:44] [distfit.distfit] [gamma ] [0.14 sec] [RSS: 5.72312e-05] [loc=-4293.712 scale=0.920] [06-05-2025 08:44:45] [distfit.distfit] [lognorm ] [0.24 sec] [RSS: 0.000438935] [loc=-0.000 scale=0.010] [06-05-2025 08:44:45] [distfit.distfit] [beta ] [0.32 sec] [RSS: 4.5937e-05] [loc=-0.000 scale=253.298] [06-05-2025 08:44:45] [distfit.distfit] [uniform ] [0.00 sec] [RSS: 5.60315e-05] [loc=0.000 scale=253.000] [06-05-2025 08:44:45] [distfit.distfit] [loggamma ] [0.14 sec] [RSS: 5.65723e-05] [loc=-14396.331 scale=2081.516] [06-05-2025 08:44:45] [distfit.distfit] [norm ] [2.53 sec] [RSS: 5.71757e-05] [loc=107.951 scale=63.651] [06-05-2025 08:44:45] [distfit.distfit] [expon ] [2.53 sec] [RSS: 0.000106332] [loc=0.000 scale=107.951] [06-05-2025 08:44:45] [distfit.distfit] [pareto ] [2.53 sec] [RSS: 0.000106332] [loc=-17179869184.000 scale=17179869184.000] [06-05-2025 08:44:45] [distfit.distfit] [dweibull ] [2.51 sec] [RSS: 5.41882e-05] [loc=106.517 scale=60.733] [06-05-2025 08:44:45] [distfit.distfit] [t ] [2.38 sec] [RSS: 5.71685e-05] [loc=107.953 scale=63.654] [06-05-2025 08:44:45] [distfit.distfit] [genextreme] [1.56 sec] [RSS: 5.05908e-05] [loc=88.181 scale=64.880] [06-05-2025 08:44:45] [distfit.distfit] [gamma ] [0.88 sec] [RSS: 5.72312e-05] [loc=-4293.712 scale=0.920] [06-05-2025 08:44:45] [distfit.distfit] [lognorm ] [0.73 sec] [RSS: 0.000438935] [loc=-0.000 scale=0.010] [06-05-2025 08:44:45] [distfit.distfit] [beta ] [0.48 sec] [RSS: 4.5937e-05] [loc=-0.000 scale=253.298] [06-05-2025 08:44:45] [distfit.distfit] [uniform ] [0.16 sec] [RSS: 5.60315e-05] [loc=0.000 scale=253.000] [06-05-2025 08:44:45] [distfit.distfit] [loggamma ] [0.16 sec] [RSS: 5.65723e-05] [loc=-14396.331 scale=2081.516] [06-05-2025 08:44:45] [distfit.distfit] Compute confidence intervals [parametric] [06-05-2025 08:44:45] [distfit.distfit] Create pdf plot for the parametric method. [06-05-2025 08:44:45] [distfit.distfit] Estimated distribution: Beta(loc:-0.000000, scale:253.298316)
# Structure learning
model = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
# [bnlearn] >Warning: Computing DAG with 12 nodes can take a very long time!
# [bnlearn] >Computing best DAG using [hc]
# [bnlearn] >Set scoring type at [bds]
# [bnlearn] >Compute structure scores for model comparison (higher is better).
print(model['structure_scores'])
# {'k2': -23261.534992034045,
# 'bic': -23296.9910477033,
# 'bdeu': -23325.348497769708,
# 'bds': -23397.741317668322}
[bnlearn] >Warning: Computing DAG with 12 nodes can take a very long time! [bnlearn] >Computing best DAG using [hc] [bnlearn] >Set scoring type at [bic] [bnlearn] >Compute structure scores for model comparison (higher is better). {'k2': -22648.785589229912, 'bic': -22717.975709504244, 'bdeu': -22696.58525478851, 'bds': -22795.52799206102}
# Compute edge weights using ChiSquare independence test.
model = bn.independence_test(model, df, test='chi_square', prune=True)
# Plot the best DAG
# bn.plot(model, edge_labels='pvalue', params_static={'maxscale': 4, 'figsize': (6, 6), 'font_size': 6, 'arrowsize': 6})
dotgraph = bn.plot_graphviz(model, edge_labels='pvalue')
dotgraph
# Store to pdf
#dotgraph.view(filename='bnlearn_predictive_maintanance')
[06-05-2025 08:49:59] [setgraphviz.setgraphviz] [INFO] The OS is not supported to automatically set Graphviz in the system env. [06-05-2025 08:49:59] [setgraphviz.setgraphviz] [INFO] Graphviz path found in environment.
[bnlearn] >Compute edge strength with [chi_square] [bnlearn] >Setting edge labels to pvalue. [bnlearn] >Converting source-target into adjacency matrix.. [bnlearn] >Making the matrix symmetric..
How do I know my causal model is right?¶
If you solely used data to compute the causal diagram, it is hard to fully verify the validity and completeness of your causal diagram. However, some solutions can help to get more trust in the causal diagram. For example, it may be possible to empirically test certain conditional independence or dependence relationships between sets of variables. If they are not in the data, it is an indication of the correctness of the causal model. Alternatively, prior expert knowledge can be added, such as a DAG or CPTs, to get more trust in the model when making inferences.
A weakness of Bayesian networks is that finding the optimum DAG is computationally expensive since an exhaustive search over all the possible structures must be performed. The limit of nodes for exhaustive search can already be around 15 nodes but also depends on the number of states. If you have more nodes, alternative methods with a scoring function and search algorithm are required. Nevertheless, to deal with problems with hundreds or maybe even thousands of variables, a different approach, such as tree-based or constraint-based approaches is necessary with the use of black/whitelisting of variables. Such an approach first determines the order and then finds the optimal BN structure for that ordering. This implies working on the search space of the possible orderings, which is convenient as it is smaller than the space of network structures.
# Learn inference model
model = bn.parameter_learning.fit(model, df, methodtype="bayes")
[bnlearn] >Parameter learning> Computing parameters using [bayes] [bnlearn] >Converting [<class 'pgmpy.base.DAG.DAG'>] to BayesianNetwork model. [bnlearn] >Converting adjmat to BayesianNetwork. [bnlearn] >CPD of Machine failure: +--------------------+-----+--------+--------+--------+ | HDF | ... | HDF(1) | HDF(1) | HDF(1) | +--------------------+-----+--------+--------+--------+ | OSF | ... | OSF(1) | OSF(1) | OSF(1) | +--------------------+-----+--------+--------+--------+ | PWF | ... | PWF(0) | PWF(1) | PWF(1) | +--------------------+-----+--------+--------+--------+ | TWF | ... | TWF(1) | TWF(0) | TWF(1) | +--------------------+-----+--------+--------+--------+ | Machine failure(0) | ... | 0.5 | 0.5 | 0.5 | +--------------------+-----+--------+--------+--------+ | Machine failure(1) | ... | 0.5 | 0.5 | 0.5 | +--------------------+-----+--------+--------+--------+ [bnlearn] >CPD of Tool wear [min]_category: +----------------------------------+-----+--------------------+ | Machine failure | ... | Machine failure(1) | +----------------------------------+-----+--------------------+ | OSF | ... | OSF(1) | +----------------------------------+-----+--------------------+ | TWF | ... | TWF(1) | +----------------------------------+-----+--------------------+ | Tool wear [min]_category(high) | ... | 0.49609375 | +----------------------------------+-----+--------------------+ | Tool wear [min]_category(medium) | ... | 0.50390625 | +----------------------------------+-----+--------------------+ [bnlearn] >CPD of TWF: +--------+-----------+ | TWF(0) | 0.950364 | +--------+-----------+ | TWF(1) | 0.0496364 | +--------+-----------+ [bnlearn] >CPD of HDF: +---------------------------------+-----+ | Rotational speed [rpm]_category | ... | +---------------------------------+-----+ | HDF(0) | ... | +---------------------------------+-----+ | HDF(1) | ... | +---------------------------------+-----+ [bnlearn] >CPD of Air temperature [K]_category: +-----+---------------------+ | ... | HDF(1) | +-----+---------------------+ | ... | 0.3067750677506775 | +-----+---------------------+ | ... | 0.27100271002710025 | +-----+---------------------+ | ... | 0.42222222222222217 | +-----+---------------------+ [bnlearn] >CPD of Process temperature [K]_category: +-----+ | ... | +-----+ | ... | +-----+ | ... | +-----+ | ... | +-----+ [bnlearn] >CPD of PWF: +--------+-----------+ | PWF(0) | 0.945909 | +--------+-----------+ | PWF(1) | 0.0540909 | +--------+-----------+ [bnlearn] >CPD of Torque [Nm]_category: +------------------------------+---------------------+--------------------+ | PWF | PWF(0) | PWF(1) | +------------------------------+---------------------+--------------------+ | Torque [Nm]_category(high) | 0.05657536440813711 | 0.3876750700280112 | +------------------------------+---------------------+--------------------+ | Torque [Nm]_category(low) | 0.06099631587377863 | 0.3322128851540616 | +------------------------------+---------------------+--------------------+ | Torque [Nm]_category(medium) | 0.8824283197180842 | 0.2801120448179272 | +------------------------------+---------------------+--------------------+ [bnlearn] >CPD of OSF: +----------------------+-----+------------------------------+ | Torque [Nm]_category | ... | Torque [Nm]_category(medium) | +----------------------+-----+------------------------------+ | OSF(0) | ... | 0.9779996434302015 | +----------------------+-----+------------------------------+ | OSF(1) | ... | 0.02200035656979854 | +----------------------+-----+------------------------------+ [bnlearn] >CPD of Type: +---------+---------------------+---------------------+ | OSF | OSF(0) | OSF(1) | +---------+---------------------+---------------------+ | Type(H) | 0.11225405370762033 | 0.28205128205128205 | +---------+---------------------+---------------------+ | Type(L) | 0.5844709350765879 | 0.42419175027870676 | +---------+---------------------+---------------------+ | Type(M) | 0.3032750112157918 | 0.29375696767001114 | +---------+---------------------+---------------------+ [bnlearn] >CPD of Rotational speed [rpm]_category: +-----+------------------------------+ | ... | Torque [Nm]_category(medium) | +-----+------------------------------+ | ... | 0.020443335116182324 | +-----+------------------------------+ | ... | 0.03295893504486836 | +-----+------------------------------+ | ... | 0.9465977298389493 | +-----+------------------------------+ [bnlearn] >Compute structure scores for model comparison (higher is better).
q = bn.inference.fit(model, variables=['Machine failure'],
evidence={'Torque [Nm]_category': 'high'},
plot=True)
[bnlearn] >Variable Elimination. +----+-------------------+----------+ | | Machine failure | p | +====+===================+==========+ | 0 | 0 | 0.584588 | +----+-------------------+----------+ | 1 | 1 | 0.415412 | +----+-------------------+----------+
Summary for variables: ['Machine failure'] Given evidence: Torque [Nm]_category=high Machine failure outcomes: - Machine failure: 0 (58.5%) - Machine failure: 1 (41.5%)
q = bn.inference.fit(model, variables=['HDF'],
evidence={'Air temperature [K]_category': 'medium'},
plot=True)
[bnlearn] >Variable Elimination. +----+-------+-----------+ | | HDF | p | +====+=======+===========+ | 0 | 0 | 0.972256 | +----+-------+-----------+ | 1 | 1 | 0.0277441 | +----+-------+-----------+
Summary for variables: ['HDF'] Given evidence: Air temperature [K]_category=medium HDF outcomes: - HDF: 0 (97.2%) - HDF: 1 (2.8%)
q = bn.inference.fit(model, variables=['TWF', 'HDF', 'PWF', 'OSF'],
evidence={'Machine failure': 1},
plot=True)
[bnlearn] >Variable Elimination. +----+-------+-------+-------+-------+-------------+ | | TWF | HDF | PWF | OSF | p | +====+=======+=======+=======+=======+=============+ | 0 | 0 | 0 | 0 | 0 | 0.0240521 | +----+-------+-------+-------+-------+-------------+ | 1 | 0 | 0 | 0 | 1 | 0.210243 | +----+-------+-------+-------+-------+-------------+ | 2 | 0 | 0 | 1 | 0 | 0.207443 | +----+-------+-------+-------+-------+-------------+ | 3 | 0 | 0 | 1 | 1 | 0.0321357 | +----+-------+-------+-------+-------+-------------+ | 4 | 0 | 1 | 0 | 0 | 0.245374 | +----+-------+-------+-------+-------+-------------+ | 5 | 0 | 1 | 0 | 1 | 0.0177909 | +----+-------+-------+-------+-------+-------------+ | 6 | 0 | 1 | 1 | 0 | 0.0185796 | +----+-------+-------+-------+-------+-------------+ | 7 | 0 | 1 | 1 | 1 | 0.00499062 | +----+-------+-------+-------+-------+-------------+ | 8 | 1 | 0 | 0 | 0 | 0.21378 | +----+-------+-------+-------+-------+-------------+ | 9 | 1 | 0 | 0 | 1 | 0.00727977 | +----+-------+-------+-------+-------+-------------+ | 10 | 1 | 0 | 1 | 0 | 0.00693896 | +----+-------+-------+-------+-------+-------------+ | 11 | 1 | 0 | 1 | 1 | 0.00148291 | +----+-------+-------+-------+-------+-------------+ | 12 | 1 | 1 | 0 | 0 | 0.00786678 | +----+-------+-------+-------+-------+-------------+ | 13 | 1 | 1 | 0 | 1 | 0.000854361 | +----+-------+-------+-------+-------+-------------+ | 14 | 1 | 1 | 1 | 0 | 0.000927891 | +----+-------+-------+-------+-------+-------------+ | 15 | 1 | 1 | 1 | 1 | 0.000260654 | +----+-------+-------+-------+-------+-------------+
Summary for variables: ['TWF', 'HDF', 'PWF', 'OSF'] Given evidence: Machine failure=1 TWF outcomes: - TWF: 0 (76.1%) - TWF: 1 (23.9%) HDF outcomes: - HDF: 0 (70.3%) - HDF: 1 (29.7%) PWF outcomes: - PWF: 0 (72.7%) - PWF: 1 (27.3%) OSF outcomes: - OSF: 0 (72.5%) - OSF: 1 (27.5%)
Summary¶
In summary, we moved from a RAW dataset to a DAG, which enabled us to go beyond descriptive statistics to prescriptive analysis. I demonstrated a data-driven approach to learn the causal structure of a dataset and to identify which aspects of the system can be adjusted to improve and reduce failure rates. Before making interventions, we also must perform inferences, which give us the updated probabilities when we fix (or observe) certain variables. Without this step, the intervention is just guessing because actions in one part of the system often ripple through and affect others (interdependence). This interconnectedness is exactly why understanding causal relationships is so important.
Support¶
This library is for free but it runs on coffee! :)
You can support in various ways, have a look at the sponser page. Report bugs, issues and feature extensions at github page.
|
![]() |