Parameter learning.


Parameter learning is the task to estimate the values of the conditional probability distributions (CPDs). To make sense of the given data, we can start by counting how often each state of the variable occurs. If the variable is dependent on the parents, the counts are done conditionally on the parents states, i.e. for seperately for each parent configuration

Currently, the library supports parameter learning for discrete nodes:
  • Maximum Likelihood Estimation

  • Bayesian Estimation, df, methodtype='bayes', scoretype='bdeu', smooth=None, n_jobs=-1, verbose=3)

Learn the parameters given the DAG and data.


Maximum Likelihood Estimation

A natural estimate for the CPDs is to simply use the relative frequencies, with which the variable states have occured. We observed x cloudy` among a total of all clouds, so we might guess that about 50% of cloudy are `sprinkler or so. According to MLE, we should fill the CPDs in such a way, that P(data|model) is maximal. This is achieved when using the relative frequencies.

While very straightforward, the ML estimator has the problem of overfitting to the data. If the observed data is not representative for the underlying distribution, ML estimations will be extremly far off. When estimating parameters for Bayesian networks, lack of data is a frequent problem. Even if the total sample size is very large, the fact that state counts are done conditionally for each parents configuration causes immense fragmentation. If a variable has 3 parents that can each take 10 states, then state counts will be done seperately for 10^3 = 1000 parents configurations. This makes MLE very fragile and unstable for learning Bayesian Network parameters. A way to mitigate MLE’s overfitting is Bayesian Parameter Estimation.

Bayesian Parameter Estimation

The Bayesian Parameter Estimator starts with already existing prior CPDs, that express our beliefs about the variables before the data was observed. Those “priors” are then updated, using the state counts from the observed data.

One can think of the priors as consisting in pseudo state counts, that are added to the actual counts before normalization. Unless one wants to encode specific beliefs about the distributions of the variables, one commonly chooses uniform priors, i.e. ones that deem all states equiprobable.

A very simple prior is the so-called K2 prior, which simply adds 1 to the count of every single state. A somewhat more sensible choice of prior is BDeu (Bayesian Dirichlet equivalent uniform prior). For BDeu we need to specify an equivalent sample size N and then the pseudo-counts are the equivalent of having observed N uniform samples of each variable (and each parent configuration).

param model:

Contains a model object with a key ‘adjmat’ (adjacency matrix).

type model:


param df:

Pandas DataFrame containing the data.

type df:


param methodtype:
Strategy for parameter learning.
  • ‘ml’, ‘maximumlikelihood’: Learning CPDs using Maximum Likelihood Estimators.

  • ‘bayes’: Bayesian Parameter Estimation.

type methodtype:

str, (default: ‘bayes’)

param scoretype:
Scoring function for the search spaces.
  • ‘bdeu’

  • ‘dirichlet’

  • ‘k2’

type scoretype:

str, (default : ‘bic’)

param smooth:

The smoothing value (α) for Bayesian parameter estimation. Should be Nonnegative.

type smooth:

float (default: None)

param Print progress to screen. The default is 3.:

0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE


dict with model.


>>> import bnlearn as bn
>>> df = bn.import_example()
>>> model = bn.import_DAG('sprinkler', CPD=False)
>>> # Parameter learning
>>> model_update =, df)
>>> bn.plot(model_update)
>>> model = bn.import_DAG('alarm')
>>> df = bn.sampling(model, n=1000)
>>> model_update =, df)
>>> G = bn.plot(model_update)