No description has been provided for this image

No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image


Bnlearn for Python¶

Welcome to the notebook of bnlearn. bnlearn is Python package for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods. Because probabilistic graphical models can be difficult in usage, Bnlearn for python (this package) is build on the pgmpy package and contains the most-wanted pipelines. Navigate to API documentations for more detailed information.

The core functionalities are:
* Causal Discovery
* Structure Learning
* Parameter Learning
* Inferences using do-calculus


Read the Medium blogs.¶

1. A Step-by-Step Guide in detecting causal relationships using Bayesian Structure Learning in Python
2. A step-by-step guide in designing knowledge-driven models using Bayesian theorem.
3. The Power of Bayesian Causal Inference: A Comparative Analysis of Libraries to Reveal Hidden Causality in Your Dataset.
4. Chat with Your Dataset using Bayesian Inferences.

Support¶

This library is for free but it runs on coffee! :)

You can support in various ways, have a look at the sponser page. Report bugs, issues and feature extensions at github page.

No description has been provided for this image Follow me on Medium

In [ ]:
# Install packages
!pip install bnlearn
!pip install datazets
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bnlearn
  Downloading bnlearn-0.8.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/66.8 kB 2.7 MB/s eta 0:00:00
Collecting pgmpy>=0.1.18 (from bnlearn)
  Downloading pgmpy-0.1.22-py3-none-any.whl (1.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 28.2 MB/s eta 0:00:00
Requirement already satisfied: networkx>=2.7.1 in /usr/local/lib/python3.10/dist-packages (from bnlearn) (3.1)
Requirement already satisfied: matplotlib>=3.3.4 in /usr/local/lib/python3.10/dist-packages (from bnlearn) (3.7.1)
Collecting numpy>=1.24.1 (from bnlearn)
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 59.9 MB/s eta 0:00:00
Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from bnlearn) (1.5.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from bnlearn) (4.65.0)
Collecting ismember (from bnlearn)
  Downloading ismember-1.0.2-py3-none-any.whl (7.5 kB)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from bnlearn) (1.2.2)
Collecting funcsigs (from bnlearn)
  Downloading funcsigs-1.0.2-py2.py3-none-any.whl (17 kB)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from bnlearn) (0.13.5)
Requirement already satisfied: python-louvain in /usr/local/lib/python3.10/dist-packages (from bnlearn) (0.16)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from bnlearn) (23.1)
Collecting df2onehot (from bnlearn)
  Downloading df2onehot-1.0.6-py3-none-any.whl (14 kB)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from bnlearn) (2023.4.0)
Collecting pypickle (from bnlearn)
  Downloading pypickle-1.1.0-py3-none-any.whl (5.1 kB)
Requirement already satisfied: tabulate in /usr/local/lib/python3.10/dist-packages (from bnlearn) (0.8.10)
Requirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (from bnlearn) (7.7.1)
Collecting datazets (from bnlearn)
  Downloading datazets-0.1.6-py3-none-any.whl (10 kB)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->bnlearn) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->bnlearn) (2022.7.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (4.39.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (8.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3.4->bnlearn) (3.0.9)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from pgmpy>=0.1.18->bnlearn) (1.10.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from pgmpy>=0.1.18->bnlearn) (2.0.1+cu118)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from pgmpy>=0.1.18->bnlearn) (1.2.0)
Requirement already satisfied: opt-einsum in /usr/local/lib/python3.10/dist-packages (from pgmpy>=0.1.18->bnlearn) (3.3.0)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from datazets->bnlearn) (2.27.1)
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (5.5.6)
Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (0.2.0)
Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (5.7.1)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (3.6.4)
Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (7.34.0)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->bnlearn) (3.0.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->bnlearn) (3.1.0)
Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.10/dist-packages (from statsmodels->bnlearn) (0.5.3)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->bnlearn) (6.1.12)
Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->bnlearn) (6.3.1)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (67.7.2)
Collecting jedi>=0.16 (from ipython>=4.0.0->ipywidgets->bnlearn)
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 70.9 MB/s eta 0:00:00
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (3.0.38)
Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (2.14.0)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (0.1.6)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->bnlearn) (4.8.0)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy>=0.5.2->statsmodels->bnlearn) (1.16.0)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.10/dist-packages (from widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (6.4.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->datazets->bnlearn) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->datazets->bnlearn) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->datazets->bnlearn) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->datazets->bnlearn) (3.4)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy>=0.1.18->bnlearn) (3.12.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy>=0.1.18->bnlearn) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy>=0.1.18->bnlearn) (1.11.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy>=0.1.18->bnlearn) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy>=0.1.18->bnlearn) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->pgmpy>=0.1.18->bnlearn) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->pgmpy>=0.1.18->bnlearn) (16.0.5)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets->bnlearn) (0.8.3)
Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (23.2.1)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (21.3.0)
Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (5.3.0)
Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (5.8.0)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (6.5.4)
Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (1.5.6)
Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (1.8.0)
Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.17.1)
Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.16.0)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets->bnlearn) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets->bnlearn) (0.2.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->pgmpy>=0.1.18->bnlearn) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->pgmpy>=0.1.18->bnlearn) (1.3.0)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core>=4.6.1->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (3.3.0)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (21.2.0)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (4.9.2)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (4.11.2)
Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (6.0.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.7.1)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.4)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.2.2)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.8.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.7.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (1.5.0)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (1.2.1)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (2.16.3)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (4.3.3)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (23.1.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.19.3)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (1.15.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (2.4.1)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (0.5.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->bnlearn) (2.21)
Installing collected packages: funcsigs, pypickle, numpy, jedi, ismember, datazets, df2onehot, pgmpy, bnlearn
  Attempting uninstall: numpy
    Found existing installation: numpy 1.22.4
    Uninstalling numpy-1.22.4:
      Successfully uninstalled numpy-1.22.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.3 which is incompatible.
tensorflow 2.12.0 requires numpy<1.24,>=1.22, but you have numpy 1.24.3 which is incompatible.
Successfully installed bnlearn-0.8.0 datazets-0.1.6 df2onehot-1.0.6 funcsigs-1.0.2 ismember-1.0.2 jedi-0.18.2 numpy-1.24.3 pgmpy-0.1.22 pypickle-1.1.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: datazets in /usr/local/lib/python3.10/dist-packages (0.1.6)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from datazets) (1.24.3)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datazets) (1.5.3)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from datazets) (2.27.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datazets) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datazets) (2022.7.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->datazets) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->datazets) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->datazets) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->datazets) (3.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datazets) (1.16.0)
In [ ]:
# Import libraries
import bnlearn as bn
import datazets as dz
import numpy as np
In [ ]:
# Get the data science salary data set
df = dz.get('ds_salaries')
[datazets] >INFO> Import dataset [ds_salaries]
[datazets] >INFO> Downloading [ds_salaries.zip] dataset from github source..

Pre-processing steps¶

Complexity is a major limitation¶

When features contain many categories, the number of probability distributions required to populate a conditional probability table (CPT) in a Bayesian network, grows exponentially with the number of parent nodes associated with that table. Or in other words, when you increase the number of categories, it requires a lot more data to gain reliable results. Intuitively it is as follows; when you hierarchically split the data into categories, the number of samples within a single category will become smaller after each split. The low number of samples per category directly affects the statistical power. In our example, we have a feature job_title that contains 99 unique titles for which only 14 job titles contain more than 25 samples. To make sure this feature is not removed and can be used in a reliable manner, we need to aggregate some of the job titles. In the code section below we will aggregate job titles into 7 main categories. This results in a good proportion of samples per category for modeling.

In [ ]:
# Create new feature: Country
df['country'] = 'USA'
countries_europe = ['SM', 'DE', 'GB', 'ES', 'FR', 'RU', 'IT', 'NL', 'CH', 'CF', 'FI', 'UA', 'IE', 'GR', 'MK', 'RO', 'AL', 'LT', 'BA', 'LV', 'EE', 'AM', 'HR', 'SI', 'PT', 'HU', 'AT', 'SK', 'CZ', 'DK', 'BE', 'MD', 'MT']
df['country'][np.isin(df['company_location'], countries_europe)]='europe'
In [ ]:
# Rename catagorical variables for better understanding
df['experience_level'] = df['experience_level'].replace({'EN': 'Entry-level', 'MI': 'Junior Mid-level', 'SE': 'Intermediate Senior-level', 'EX': 'Expert Executive-level / Director'}, regex=True)
df['employment_type'] = df['employment_type'].replace({'PT': 'Part-time', 'FT': 'Full-time', 'CT': 'Contract', 'FL': 'Freelance'}, regex=True)
df['company_size'] = df['company_size'].replace({'S': 'Small (less than 50)', 'M': 'Medium (50 to 250)', 'L': 'Large (>250)'}, regex=True)
df['remote_ratio'] = df['remote_ratio'].replace({0: 'No remote', 50: 'Partially remote', 100: '>80% remote'}, regex=True)
In [ ]:
# Group similar job titles
titles = [['data scientist', 'data science', 'research', 'applied', 'specialist', 'ai', 'machine learning'],
          ['engineer', 'etl'],
          ['analyst', 'bi', 'business', 'product', 'modeler', 'analytics'],
          ['manager', 'head', 'director'],
          ['architect', 'cloud', 'aws'],
          ['lead/principal', 'lead', 'principal'],
          ]

job_title = df['job_title'].str.lower().copy()
df['job_title'] = 'Other'
for t in titles:
    for name in t:
        df['job_title'][list(map(lambda x: name in x, job_title))]=t[0]

# engineer          1654
# data scientist    1238
# analyst            902
# manager            158
# architect          118
# lead/principal      55
# Other                9
# Name: job_title, dtype: int64

Discretize the continous variable salary¶

We also need to discretize salary_in_usd which can be done manually or using the discretizer function in bnlearn. For demonstration purposes, let's do both. For the latter case, we assume that salary depends on experience_level and on the country. Based on these input variables, the salary is then partitioned into bins (see code section below).

In [ ]:
discretize_method='manual'

if discretize_method=='manual':
    salary_in_usd = df['salary_in_usd']
    # Remove redundant variables
    df.drop(labels=['salary_currency', 'salary', 'salary_in_usd'], inplace=True, axis=1)
    # Set salary
    df['salary_in_usd'] = None
    df['salary_in_usd'].loc[salary_in_usd<60000]='<60K'
    df['salary_in_usd'].loc[np.logical_and(salary_in_usd>=60000, salary_in_usd<100000)]='60-100K'
    df['salary_in_usd'].loc[np.logical_and(salary_in_usd>=100000, salary_in_usd<160000)]='100-160K'
    df['salary_in_usd'].loc[np.logical_and(salary_in_usd>=160000, salary_in_usd<250000)]='160-250K'
    df['salary_in_usd'].loc[salary_in_usd>=250000]='>250K'
else:
    # Discritize salary
    tmpdf = df[['experience_level', 'salary_in_usd', 'country']]
    # Create edges
    edges = [('experience_level', 'salary_in_usd'), ('country', 'salary_in_usd')]
    # Create DAG based on edges
    DAG = bn.make_DAG(edges)
    bn.plot(DAG)
    # Discretize the continous columns
    df_disc = bn.discretize(tmpdf, edges, ["salary_in_usd"], max_iterations=1)
    df['salary_in_usd'] = df_disc['salary_in_usd']
    df['salary_in_usd'].value_counts()
    # Remove redundant variables
    df.drop(labels=['salary_currency', 'salary'], inplace=True, axis=1)

Learn the causal DAG from the data.¶

At this point, we have pre-processed the data set and are ready to learn the causal structure. There are six algorithms implemented in bnlearn that can help for this task:

'hc' or 'hillclimbsearch' (default)
'ex' or 'exhaustivesearch'
'cs' or 'constraintsearch'
'cl' or 'chow-liu' (requires setting root_node parameter)
'nb' or 'naivebayes' (requires <root_node>)
'tan' (requires <root_node> and <class_node> parameter)
In [ ]:
# %% Learn the causal DAG from the data

# Structure learning
model = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
[bnlearn] >Computing best DAG using [hc]
[bnlearn] >Set scoring type at [bic]
[bnlearn] >Compute structure scores for model comparison (higher is better).
In [ ]:
# independence test
model = bn.independence_test(model, df, prune=False)
[bnlearn] >Compute edge strength with [chi_square]
In [ ]:
# Check best scoring type
best_scoretype = min(model['structure_scores'], key=lambda k: model['structure_scores'][k])
print(best_scoretype)
bic
In [ ]:
# Parameter learning
model = bn.parameter_learning.fit(model, df, methodtype="bayes")
[bnlearn] >Parameter learning> Computing parameters using [bayes]
[bnlearn] >Converting [<class 'pgmpy.base.DAG.DAG'>] to BayesianNetwork model.
[bnlearn] >Converting adjmat to BayesianNetwork.
[bnlearn] >CPD of work_year:
+-----------------+-----+--------------------------------+
| remote_ratio    | ... | remote_ratio(Partially remote) |
+-----------------+-----+--------------------------------+
| work_year(2020) | ... | 0.19636135508155586            |
+-----------------+-----+--------------------------------+
| work_year(2021) | ... | 0.2998745294855709             |
+-----------------+-----+--------------------------------+
| work_year(2022) | ... | 0.27728983688833125            |
+-----------------+-----+--------------------------------+
| work_year(2023) | ... | 0.22647427854454205            |
+-----------------+-----+--------------------------------+
[bnlearn] >CPD of company_size:
+-----+--------------------------------+
| ... | remote_ratio(Partially remote) |
+-----+--------------------------------+
| ... | work_year(2023)                |
+-----+--------------------------------+
| ... | 0.39704524469067404            |
+-----+--------------------------------+
| ... | 0.322253000923361              |
+-----+--------------------------------+
| ... | 0.2807017543859649             |
+-----+--------------------------------+
[bnlearn] >CPD of remote_ratio:
+--------------------------------+-----+----------------------+
| country                        | ... | country(europe)      |
+--------------------------------+-----+----------------------+
| salary_in_usd                  | ... | salary_in_usd(>250K) |
+--------------------------------+-----+----------------------+
| remote_ratio(>80% remote)      | ... | 0.31152647975077885  |
+--------------------------------+-----+----------------------+
| remote_ratio(No remote)        | ... | 0.3769470404984424   |
+--------------------------------+-----+----------------------+
| remote_ratio(Partially remote) | ... | 0.31152647975077885  |
+--------------------------------+-----+----------------------+
[bnlearn] >CPD of employment_type:
+----------------------------+-----+--------------------------------+
| remote_ratio               | ... | remote_ratio(Partially remote) |
+----------------------------+-----+--------------------------------+
| employment_type(Contract)  | ... | 0.1606022584692597             |
+----------------------------+-----+--------------------------------+
| employment_type(Freelance) | ... | 0.164366373902133              |
+----------------------------+-----+--------------------------------+
| employment_type(Full-time) | ... | 0.5031367628607276             |
+----------------------------+-----+--------------------------------+
| employment_type(Part-time) | ... | 0.17189460476787952            |
+----------------------------+-----+--------------------------------+
[bnlearn] >CPD of company_location:
+----------------------+------------+
| company_location(AE) | 0.00341128 |
+----------------------+------------+
| company_location(AL) | 0.00282694 |
+----------------------+------------+
| company_location(AM) | 0.00282694 |
+----------------------+------------+
| company_location(AR) | 0.00360606 |
+----------------------+------------+
| company_location(AS) | 0.0032165  |
+----------------------+------------+
| company_location(AT) | 0.00380084 |
+----------------------+------------+
| company_location(AU) | 0.00574864 |
+----------------------+------------+
| company_location(BA) | 0.00282694 |
+----------------------+------------+
| company_location(BE) | 0.00341128 |
+----------------------+------------+
| company_location(BO) | 0.00282694 |
+----------------------+------------+
| company_location(BR) | 0.00555386 |
+----------------------+------------+
| company_location(BS) | 0.00282694 |
+----------------------+------------+
| company_location(CA) | 0.021331   |
+----------------------+------------+
| company_location(CF) | 0.00302172 |
+----------------------+------------+
| company_location(CH) | 0.00360606 |
+----------------------+------------+
| company_location(CL) | 0.00282694 |
+----------------------+------------+
| company_location(CN) | 0.00282694 |
+----------------------+------------+
| company_location(CO) | 0.0041904  |
+----------------------+------------+
| company_location(CR) | 0.00282694 |
+----------------------+------------+
| company_location(CZ) | 0.0032165  |
+----------------------+------------+
| company_location(DE) | 0.0141242  |
+----------------------+------------+
| company_location(DK) | 0.00341128 |
+----------------------+------------+
| company_location(DZ) | 0.00282694 |
+----------------------+------------+
| company_location(EE) | 0.00302172 |
+----------------------+------------+
| company_location(EG) | 0.00282694 |
+----------------------+------------+
| company_location(ES) | 0.0187989  |
+----------------------+------------+
| company_location(FI) | 0.0032165  |
+----------------------+------------+
| company_location(FR) | 0.00964424 |
+----------------------+------------+
| company_location(GB) | 0.042562   |
+----------------------+------------+
| company_location(GH) | 0.00302172 |
+----------------------+------------+
| company_location(GR) | 0.00535908 |
+----------------------+------------+
| company_location(HK) | 0.00282694 |
+----------------------+------------+
| company_location(HN) | 0.00282694 |
+----------------------+------------+
| company_location(HR) | 0.0032165  |
+----------------------+------------+
| company_location(HU) | 0.00302172 |
+----------------------+------------+
| company_location(ID) | 0.00302172 |
+----------------------+------------+
| company_location(IE) | 0.00399562 |
+----------------------+------------+
| company_location(IL) | 0.00302172 |
+----------------------+------------+
| company_location(IN) | 0.0139294  |
+----------------------+------------+
| company_location(IQ) | 0.00282694 |
+----------------------+------------+
| company_location(IR) | 0.00282694 |
+----------------------+------------+
| company_location(IT) | 0.00380084 |
+----------------------+------------+
| company_location(JP) | 0.0041904  |
+----------------------+------------+
| company_location(KE) | 0.00302172 |
+----------------------+------------+
| company_location(LT) | 0.00302172 |
+----------------------+------------+
| company_location(LU) | 0.0032165  |
+----------------------+------------+
| company_location(LV) | 0.00341128 |
+----------------------+------------+
| company_location(MA) | 0.00282694 |
+----------------------+------------+
| company_location(MD) | 0.00282694 |
+----------------------+------------+
| company_location(MK) | 0.00282694 |
+----------------------+------------+
| company_location(MT) | 0.00282694 |
+----------------------+------------+
| company_location(MX) | 0.00457996 |
+----------------------+------------+
| company_location(MY) | 0.00282694 |
+----------------------+------------+
| company_location(NG) | 0.0041904  |
+----------------------+------------+
| company_location(NL) | 0.0051643  |
+----------------------+------------+
| company_location(NZ) | 0.00282694 |
+----------------------+------------+
| company_location(PH) | 0.00302172 |
+----------------------+------------+
| company_location(PK) | 0.00341128 |
+----------------------+------------+
| company_location(PL) | 0.00360606 |
+----------------------+------------+
| company_location(PR) | 0.00341128 |
+----------------------+------------+
| company_location(PT) | 0.00535908 |
+----------------------+------------+
| company_location(RO) | 0.00302172 |
+----------------------+------------+
| company_location(RU) | 0.00341128 |
+----------------------+------------+
| company_location(SE) | 0.00302172 |
+----------------------+------------+
| company_location(SG) | 0.00380084 |
+----------------------+------------+
| company_location(SI) | 0.00341128 |
+----------------------+------------+
| company_location(SK) | 0.00282694 |
+----------------------+------------+
| company_location(SM) | 0.00282694 |
+----------------------+------------+
| company_location(TH) | 0.0032165  |
+----------------------+------------+
| company_location(TR) | 0.00360606 |
+----------------------+------------+
| company_location(UA) | 0.00341128 |
+----------------------+------------+
| company_location(US) | 0.654366   |
+----------------------+------------+
| company_location(VN) | 0.00282694 |
+----------------------+------------+
| company_location(ZA) | 0.00282694 |
+----------------------+------------+
[bnlearn] >CPD of country:
+------------------+-----+----------------------+
| company_location | ... | company_location(ZA) |
+------------------+-----+----------------------+
| country(USA)     | ... | 0.5344506517690876   |
+------------------+-----+----------------------+
| country(europe)  | ... | 0.4655493482309125   |
+------------------+-----+----------------------+
[bnlearn] >CPD of employee_residence:
+------------------------+-----------------------+----------------------+
| country                | country(USA)          | country(europe)      |
+------------------------+-----------------------+----------------------+
| employee_residence(AE) | 0.002478839177750907  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(AM) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(AR) | 0.0036880290205562275 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(AS) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(AT) | 0.0015114873035066505 | 0.012262262262262262 |
+------------------------+-----------------------+----------------------+
| employee_residence(AU) | 0.00441354292623942   | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(BA) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(BE) | 0.0017533252720677147 | 0.01026026026026026  |
+------------------------+-----------------------+----------------------+
| employee_residence(BG) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(BO) | 0.0022370012091898427 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(BR) | 0.005864570737605804  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(CA) | 0.02400241837968561   | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(CF) | 0.0015114873035066505 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(CH) | 0.0015114873035066505 | 0.01026026026026026  |
+------------------------+-----------------------+----------------------+
| employee_residence(CL) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(CN) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(CO) | 0.0034461910519951633 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(CR) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(CY) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(CZ) | 0.0015114873035066505 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(DE) | 0.0019951632406287785 | 0.05630630630630631  |
+------------------------+-----------------------+----------------------+
| employee_residence(DK) | 0.0015114873035066505 | 0.009259259259259259 |
+------------------------+-----------------------+----------------------+
| employee_residence(DO) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(DZ) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(EE) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(EG) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(ES) | 0.0022370012091898427 | 0.08933933933933934  |
+------------------------+-----------------------+----------------------+
| employee_residence(FI) | 0.0015114873035066505 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(FR) | 0.002478839177750907  | 0.044294294294294295 |
+------------------------+-----------------------+----------------------+
| employee_residence(GB) | 0.0017533252720677147 | 0.20445445445445445  |
+------------------------+-----------------------+----------------------+
| employee_residence(GE) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(GH) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(GR) | 0.0017533252720677147 | 0.02127127127127127  |
+------------------------+-----------------------+----------------------+
| employee_residence(HK) | 0.0017533252720677147 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(HN) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(HR) | 0.0015114873035066505 | 0.009259259259259259 |
+------------------------+-----------------------+----------------------+
| employee_residence(HU) | 0.0017533252720677147 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(ID) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(IE) | 0.0015114873035066505 | 0.013263263263263263 |
+------------------------+-----------------------+----------------------+
| employee_residence(IL) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(IN) | 0.018198307134220073  | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(IQ) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(IR) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(IT) | 0.0019951632406287785 | 0.014264264264264264 |
+------------------------+-----------------------+----------------------+
| employee_residence(JE) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(JP) | 0.0034461910519951633 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(KE) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(KW) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(LT) | 0.0015114873035066505 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(LU) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(LV) | 0.0015114873035066505 | 0.01026026026026026  |
+------------------------+-----------------------+----------------------+
| employee_residence(MA) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(MD) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(MK) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(MT) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(MX) | 0.003929866989117291  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(MY) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(NG) | 0.003929866989117291  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(NL) | 0.0017533252720677147 | 0.02027027027027027  |
+------------------------+-----------------------+----------------------+
| employee_residence(NZ) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(PH) | 0.0022370012091898427 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(PK) | 0.002962515114873035  | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(PL) | 0.002720677146311971  | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(PR) | 0.002720677146311971  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(PT) | 0.0022370012091898427 | 0.02127127127127127  |
+------------------------+-----------------------+----------------------+
| employee_residence(RO) | 0.0017533252720677147 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(RS) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(RU) | 0.0019951632406287785 | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(SE) | 0.0019951632406287785 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(SG) | 0.002720677146311971  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(SI) | 0.0015114873035066505 | 0.01026026026026026  |
+------------------------+-----------------------+----------------------+
| employee_residence(SK) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(TH) | 0.0022370012091898427 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(TN) | 0.0015114873035066505 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(TR) | 0.002720677146311971  | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
| employee_residence(UA) | 0.0015114873035066505 | 0.01026026026026026  |
+------------------------+-----------------------+----------------------+
| employee_residence(US) | 0.8005441354292624    | 0.008258258258258258 |
+------------------------+-----------------------+----------------------+
| employee_residence(UZ) | 0.0019951632406287785 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(VN) | 0.0019951632406287785 | 0.007257257257257258 |
+------------------------+-----------------------+----------------------+
| employee_residence(ZA) | 0.0017533252720677147 | 0.006256256256256257 |
+------------------------+-----------------------+----------------------+
[bnlearn] >CPD of salary_in_usd:
+-------------------------+---------------------+---------------------+
| country                 | country(USA)        | country(europe)     |
+-------------------------+---------------------+---------------------+
| salary_in_usd(100-160K) | 0.39492140266021764 | 0.1841841841841842  |
+-------------------------+---------------------+---------------------+
| salary_in_usd(160-250K) | 0.31632406287787185 | 0.12512512512512514 |
+-------------------------+---------------------+---------------------+
| salary_in_usd(60-100K)  | 0.13905683192261184 | 0.2882882882882883  |
+-------------------------+---------------------+---------------------+
| salary_in_usd(<60K)     | 0.07351874244256348 | 0.2952952952952953  |
+-------------------------+---------------------+---------------------+
| salary_in_usd(>250K)    | 0.0761789600967352  | 0.10710710710710711 |
+-------------------------+---------------------+---------------------+
[bnlearn] >CPD of experience_level:
+-----+----------------------+
| ... | salary_in_usd(>250K) |
+-----+----------------------+
| ... | 0.12085308056872038  |
+-----+----------------------+
| ... | 0.17298578199052134  |
+-----+----------------------+
| ... | 0.542654028436019    |
+-----+----------------------+
| ... | 0.16350710900473933  |
+-----+----------------------+
[bnlearn] >CPD of job_title:
+---------------------------+-----+----------------------+
| salary_in_usd             | ... | salary_in_usd(>250K) |
+---------------------------+-----+----------------------+
| job_title(Other)          | ... | 0.06770480704129994  |
+---------------------------+-----+----------------------+
| job_title(analyst)        | ... | 0.0961408259986459   |
+---------------------------+-----+----------------------+
| job_title(architect)      | ... | 0.0914014895057549   |
+---------------------------+-----+----------------------+
| job_title(data scientist) | ... | 0.21699390656736628  |
+---------------------------+-----+----------------------+
| job_title(engineer)       | ... | 0.3425863236289777   |
+---------------------------+-----+----------------------+
| job_title(lead/principal) | ... | 0.07718348002708192  |
+---------------------------+-----+----------------------+
| job_title(manager)        | ... | 0.10798916723087339  |
+---------------------------+-----+----------------------+
[bnlearn] >Compute structure scores for model comparison (higher is better).
In [ ]:
# Plot
# bn.plot(model, params_static={'layout': 'planar_layout'}, title='method=hc and score=bic')
bn.plot(model, title='method=hc and score=bic', params_static={'figsize': (8, 6), 'arrowsize': 20, 'font_size': 10});
[bnlearn] >Set node properties.
[bnlearn]> Set edge weights based on the [chi_square] test statistic.
[bnlearn] >Set edge properties.
[bnlearn] >Plot based on Bayesian model
No description has been provided for this image
In [ ]:
!pip install d3blocks
# bn.plot(model, interactive=True, params_interactive={'notebook': True});

Make inferences using do-calculus¶

In [ ]:
query = bn.inference.fit(model, variables=['job_title'],
                         evidence={'company_size': 'Large (>250)'})
[bnlearn] >Variable Elimination..
[bnlearn] >Data is stored in [query.df]
+----+----------------+-----------+
|    | job_title      |         p |
+====+================+===========+
|  0 | Other          | 0.0314356 |
+----+----------------+-----------+
|  1 | analyst        | 0.207313  |
+----+----------------+-----------+
|  2 | architect      | 0.0511753 |
+----+----------------+-----------+
|  3 | data scientist | 0.267323  |
+----+----------------+-----------+
|  4 | engineer       | 0.343019  |
+----+----------------+-----------+
|  5 | lead/principal | 0.0405911 |
+----+----------------+-----------+
|  6 | manager        | 0.0591429 |
+----+----------------+-----------+
In [ ]:
# Change the variables accordingly
from tabulate import tabulate
print(tabulate([df['experience_level'].unique()], tablefmt="grid", headers="keys"))
print(tabulate([df['job_title'].unique()], tablefmt="grid", headers="keys"))
print(tabulate([df['remote_ratio'].unique()], tablefmt="grid", headers="keys"))
+------------------+---------------------------+-------------+-----------------------------------+
| 0                | 1                         | 2           | 3                                 |
+==================+===========================+=============+===================================+
| Junior Mid-level | Intermediate Senior-level | Entry-level | Expert Executive-level / Director |
+------------------+---------------------------+-------------+-----------------------------------+
+-----------+----------------+----------+---------+---------+----------------+-------+
| 0         | 1              | 2        | 3       | 4       | 5              | 6     |
+===========+================+==========+=========+=========+================+=======+
| architect | data scientist | engineer | analyst | manager | lead/principal | Other |
+-----------+----------------+----------+---------+---------+----------------+-------+
+-------------+-----------+------------------+
| 0           | 1         | 2                |
+=============+===========+==================+
| >80% remote | No remote | Partially remote |
+-------------+-----------+------------------+
In [ ]:
query = bn.inference.fit(model,
                         variables=['salary_in_usd'],
                         evidence={'employment_type': 'Full-time',
                                   'remote_ratio': 'Partially remote',
                                   'job_title': 'data scientist',
                                   'employee_residence': 'DE',
                                    'experience_level': 'Intermediate Senior-level',
                                   },
                         )
[bnlearn] >Variable Elimination..
[bnlearn] >Data is stored in [query.df]
+----+-----------------+----------+
|    | salary_in_usd   |        p |
+====+=================+==========+
|  0 | 100-160K        | 0.254975 |
+----+-----------------+----------+
|  1 | 160-250K        | 0.25903  |
+----+-----------------+----------+
|  2 | 60-100K         | 0.177658 |
+----+-----------------+----------+
|  3 | <60K            | 0.184311 |
+----+-----------------+----------+
|  4 | >250K           | 0.124026 |
+----+-----------------+----------+

Support¶

This library is for free but it runs on coffee! :)

You can support in various ways, have a look at the sponser page. Report bugs, issues and feature extensions at github page.

No description has been provided for this image Follow me on Medium
In [ ]: