Impute

The Bnlearn library provides two different imputation methods. In both methods, categorical columns are excluded first, and missing numerical values are imputed using either the KNN or MICE approach. Once the numerical values are imputed, the resulting DataFrame is used to build a Nearest Neighbors (NN) model. Finally, missing categorical values are imputed using the 1-NN model based on the imputed numerical data.

KNN Imputer

Impute missing values in a DataFrame using KNN imputation. Lets load a dataframe with missing values and perform the imputation.

# Initialize libraries
import bnlearn as bn
import pandas as pd

# Load the dataset
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original', delim_whitespace=True, header=None, names=['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name'])

# Create some identifical rows as test-case
df.loc[1]=df.loc[0]
df.loc[11]=df.loc[10]
df.loc[50]=df.loc[20]

# Set few rows to None
index_nan = [0, 10, 20]
carnames = df['car name'].loc[index_nan]
df['car name'].loc[index_nan]=None
df.isna().sum()

# KNN imputer
dfnew = bn.knn_imputer(df, n_neighbors=3, weights='distance', string_columns=['car name'])
# Results
np.all(dfnew['car name'].loc[index_nan].values==carnames.values)

MICE Imputer

Impute missing values in a DataFrame using MICE imputation. It implements MICE using the function mice_imputer function that performs Multiple Imputation by Chained Equations (MICE) on numeric columns while handling string/categorical columns.

Key features include:

  • Supports MICE imputation for numeric columns.

  • String/categorical columns are encoded before imputation and restored post-imputation.

  • Includes options to specify the imputation estimator, number of iterations (max_iter), and verbosity level for logging.

  • Numeric columns are auto-identified and converted for imputation where necessary.

  • This enhancement improves missing data handling and supports mixed-type datasets.

Lets load a dataframe with missing values and perform the imputation.

# Initialize libraries
import bnlearn as bn
import pandas as pd

# Load the dataset
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original', delim_whitespace=True, header=None, names=['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name'])

# Create some identifical rows as test-case
df.loc[1]=df.loc[0]
df.loc[11]=df.loc[10]
df.loc[50]=df.loc[20]

# Set few rows to None
index_nan = [0, 10, 20]
carnames = df['car name'].loc[index_nan]
df['car name'].loc[index_nan]=None
df.isna().sum()

# MICE imputer
dfnew = bn.mice_imputer(df, max_iter=5, string_columns='car name')
# Results
np.all(dfnew['car name'].loc[index_nan].values==carnames.values)