Impute ======================== The ``Bnlearn`` library provides two different imputation methods. In both methods, categorical columns are excluded first, and missing numerical values are imputed using either the KNN or MICE approach. Once the numerical values are imputed, the resulting DataFrame is used to build a Nearest Neighbors (NN) model. Finally, missing categorical values are imputed using the 1-NN model based on the imputed numerical data. KNN Imputer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Impute missing values in a DataFrame using KNN imputation. Lets load a dataframe with missing values and perform the imputation. .. code-block:: python # Initialize libraries import bnlearn as bn import pandas as pd # Load the dataset df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original', delim_whitespace=True, header=None, names=['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']) # Create some identifical rows as test-case df.loc[1]=df.loc[0] df.loc[11]=df.loc[10] df.loc[50]=df.loc[20] # Set few rows to None index_nan = [0, 10, 20] carnames = df['car name'].loc[index_nan] df['car name'].loc[index_nan]=None df.isna().sum() # KNN imputer dfnew = bn.knn_imputer(df, n_neighbors=3, weights='distance', string_columns=['car name']) # Results np.all(dfnew['car name'].loc[index_nan].values==carnames.values) MICE Imputer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Impute missing values in a DataFrame using MICE imputation. It implements MICE using the function mice_imputer function that performs Multiple Imputation by Chained Equations (MICE) on numeric columns while handling string/categorical columns. Key features include: * Supports MICE imputation for numeric columns. * String/categorical columns are encoded before imputation and restored post-imputation. * Includes options to specify the imputation estimator, number of iterations (max_iter), and verbosity level for logging. * Numeric columns are auto-identified and converted for imputation where necessary. * This enhancement improves missing data handling and supports mixed-type datasets. Lets load a dataframe with missing values and perform the imputation. .. code-block:: python # Initialize libraries import bnlearn as bn import pandas as pd # Load the dataset df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original', delim_whitespace=True, header=None, names=['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']) # Create some identifical rows as test-case df.loc[1]=df.loc[0] df.loc[11]=df.loc[10] df.loc[50]=df.loc[20] # Set few rows to None index_nan = [0, 10, 20] carnames = df['car name'].loc[index_nan] df['car name'].loc[index_nan]=None df.isna().sum() # MICE imputer dfnew = bn.mice_imputer(df, max_iter=5, string_columns='car name') # Results np.all(dfnew['car name'].loc[index_nan].values==carnames.values) .. include:: add_bottom.add