Classification Examples '''''''''''''''''''''''' Some of the described examples can also be found in the Colab notebooks: See `classification Colab notebook`_. .. _classification Colab notebook: https://colab.research.google.com/github/erdogant/hgboost/blob/master/notebooks/hgboost_classification_examples.ipynb xgboost two-class ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.xgboost` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3) # Load example data set df = hgb.import_example() # Prepare data for classification y = df['Survived'].values del df['Survived'] X = hgb.preprocessing(df, verbose=0) # Fit best model with desired evaluation metric: results = hgb.xgboost(X, y, pos_label=1, eval_metric='f1') # [hgboost] >Start hgboost classification.. # [hgboost] >Collecting xgb_clf parameters. # [hgboost] >Number of variables in search space is [10], loss function: [f1]. # [hgboost] >method: xgb_clf # [hgboost] >eval_metric: f1 # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) catboost ------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.catboost` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3) # Load example data set df = hgb.import_example() # Prepare data for classification y = df['Survived'].values del df['Survived'] X = hgb.preprocessing(df, verbose=0) # Fit best model with desired evaluation metric: results = hgb.catboost(X, y, pos_label=1, eval_metric='auc') # [hgboost] >Start hgboost classification.. # [hgboost] >Collecting ctb_clf parameters. # [hgboost] >Number of variables in search space is [10], loss function: [auc]. # [hgboost] >method: ctb_clf # [hgboost] >eval_metric: auc # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) lightboost ------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.lightboost` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3) # Load example data set df = hgb.import_example() # Prepare data for classification y = df['Survived'].values del df['Survived'] X = hgb.preprocessing(df, verbose=0) # Fit best model with desired evaluation metric: results = hgb.lightboost(X, y, pos_label=1, eval_metric='auc') # [hgboost] >Start hgboost classification.. # [hgboost] >Collecting lgb_clf parameters. # [hgboost] >Number of variables in search space is [10], loss function: [auc]. # [hgboost] >method: lgb_clf # [hgboost] >eval_metric: auc # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) Multi-classification Examples '''''''''''''''''''''''''''''' xgboost multi-class --------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.xgboost` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3) # Load example data set df = hgb.import_example() # Prepare data for classification y = df['Parch'].values y[y>=3]=3 del df['Parch'] X = hgb.preprocessing(df, verbose=0) # Fit best model with desired evaluation metric: results = hgb.xgboost(X, y, method='xgb_clf_multi', eval_metric='kappa') # [hgboost] >Start hgboost classification.. # [hgboost] >Collecting xgb_clf parameters # [hgboost] >Number of variables in search space is [10], loss function: [kappa] # [hgboost] >method: xgb_clf_multi # [hgboost] >eval_metric: kappa # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) Regression Examples '''''''''''''''''''''''' Some of the described examples can also be found in the notebooks: See `regression Colab notebook`_. .. _regression Colab notebook: https://colab.research.google.com/github/erdogant/hgboost/blob/master/notebooks/hgboost_regression_examples.ipynb xgboost_reg ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.xgboost_reg` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None) # Load example data set df = hgb.import_example() y = df['Age'].values del df['Age'] I = ~np.isnan(y) X = hgb.preprocessing(df, verbose=0) X = X.loc[I,:] y = y[I] # Fit best model with desired evaluation metric: results = hgb.xgboost_reg(X, y, eval_metric='rmse') # [hgboost] >Start hgboost regression.. # [hgboost] >Collecting xgb_reg parameters. # [hgboost] >Number of variables in search space is [10], loss function: [rmse]. # [hgboost] >method: xgb_reg # [hgboost] >eval_metric: rmse # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) lightboost_reg ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.lightboost_reg` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None) # Load example data set df = hgb.import_example() y = df['Age'].values del df['Age'] I = ~np.isnan(y) X = hgb.preprocessing(df, verbose=0) X = X.loc[I,:] y = y[I] # Fit best model with desired evaluation metric: results = hgb.lightboost_reg(X, y, eval_metric='rmse') # [hgboost] >Start hgboost regression.. # [hgboost] >Collecting lgb_reg parameters. # [hgboost] >Number of variables in search space is [10], loss function: [rmse]. # [hgboost] >method: lgb_reg # [hgboost] >eval_metric: rmse # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) catboost_reg ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.catboost_reg` .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None) # Load example data set df = hgb.import_example() y = df['Age'].values del df['Age'] I = ~np.isnan(y) X = hgb.preprocessing(df, verbose=0) X = X.loc[I,:] y = y[I] # Fit best model with desired evaluation metric: results = hgb.catboost_reg(X, y, eval_metric='rmse') # [hgboost] >Start hgboost regression.. # [hgboost] >Collecting ctb_reg parameters. # [hgboost] >Number of variables in search space is [10], loss function: [rmse]. # [hgboost] >method: ctb_reg # [hgboost] >eval_metric: rmse # [hgboost] >greater_is_better: True # [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization.. # Plot the parameter space hgb.plot_params() # Plot the summary results hgb.plot() # Plot the best performing tree hgb.treeplot() # Plot results on the validation set hgb.plot_validation() # Plot results on the cross-validation hgb.plot_cv() # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) Ensemble Examples '''''''''''''''''''''''' An ensemble is that each of the fitted models, such as xgboost, lightboost and catboost is even further combined into one function. The results are usually superior compared to single models. However, the model complexity increases and training time too. An ensemble can be created for both classification and the regression models. The function documentation can be found here :func:`hgboost.hgboost.hgboost.ensemble` Ensemble Classification ------------------------- It can be seen from the results that the ensemble classifier performs superior compared to all indiviudal models. .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3) # Import data df = hgb.import_example() y = df['Survived'].values del df['Survived'] X = hgb.preprocessing(df, verbose=0) # Fit ensemble model using the three boosting methods. By default these are readily set. results = hgb.ensemble(X, y, pos_label=1) # [hgboost] >Create ensemble regression model.. # [hgboost] >... # [hgboost] >Fit ensemble model with [soft] voting.. # [hgboost] >Evalute [ensemble] model on independent validation dataset (179 samples, 20%) # [hgboost] >[Ensemble] [auc]: -0.9788 on independent validation dataset # [hgboost] >[xgb_clf] [auc]: -0.8434 on independent validation dataset # [hgboost] >[ctb_clf] [auc]: -0.8875 on independent validation dataset # [hgboost] >[lgb_clf] [auc]: -0.8816 on independent validation dataset # use the predictor y_pred, y_proba = hgb.predict(X) # Plot hgb.plot_validation() Ensemble Regression ------------------------- It can be seen from the results that the ensemble classifier performs superior compared to all indiviudal models. .. code:: python # Import library from hgboost import hgboost # Initialize hgb = hgboost(max_eval=250, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None) # Load example data set df = hgb.import_example() y = df['Age'].values del df['Age'] I = ~np.isnan(y) X = hgb.preprocessing(df, verbose=0) X = X.loc[I,:] y = y[I] # Fit ensemble model using the three boosting methods: results = hgb.ensemble(X, y, methods=['xgb_reg','ctb_reg','lgb_reg']) # [hgboost] >Create ensemble regression model.. # [hgboost] >... # [hgboost] >Evalute [ensemble] model on independent validation dataset (143 samples, 20%). # [hgboost] >[Ensemble] [rmse]: 64.62 on independent validation dataset # [hgboost] >[xgb_reg] [rmse]: 172.2 on independent validation dataset # [hgboost] >[ctb_reg] [rmse]: 183 on independent validation dataset # [hgboost] >[lgb_reg] [rmse]: 205.9 on independent validation dataset # Make new prdiction using the model (suppose that X is new and unseen data which is similarly prepared as for the learning process) y_pred, y_proba = hgb.predict(X) # Plot hgb.plot_validation() Plots '''''''''''''''''''''''' For each model, the following 5 plots can be created: plot_params ------------------- Figure 1 depicts the density of the specific parameter values. As an example, the **gamma** parameter shows that most iterations converges towards value **0**. This may indicate that this parameter with this value has an important role in the in computing the optimal loss. Figure 2 depicts the iterations performed for hyper-optimization per parameter. In case of **colsample_bytree** we see a convergence towards the range [0.5-0.7]. In both figures, the parameters for all fitted models are plotted together with the **best** performing models with and without the **k-fold crossvalidation**. In addition, we also plot the top n performing models. The top performing models can be usefull to deeper examine the used parameter. Function documentation can be found here :func:`hgboost.hgboost.hgboost.plot_params` .. code:: python # Plot the parameter space hgb.plot_params() .. |figS1| image:: ../figs/plot_params_clf_1.png .. |figS2| image:: ../figs/plot_params_clf_2.png .. table:: Parameter plot :align: center +----------+ | |figS1| | +----------+ | |figS2| | +----------+ plot summary ------------------- This figure exists out of two subfigures. The top figure depicts all evaluated models with the loss score. The **best** performing models with and without the **k-fold crossvalidation** are depicted together with the top n performing models. The bottom figure depicts the train and test-error. Function documentation can be found here :func:`hgboost.hgboost.hgboost.plot` .. code:: python # Plot the summary results hgb.plot() .. |figS3| image:: ../figs/plot_clf.png .. table:: Summary plot of the results. :align: center +----------+ | |figS3| | +----------+ treeplot ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.treeplot` .. code:: python # Plot the best performing tree hgb.treeplot() .. |figS4| image:: ../figs/treeplot_clf_1.png .. |figS5| image:: ../figs/treeplot_clf_2.png .. table:: Best performing tree. :align: center +----------+ | |figS4| | +----------+ | |figS5| | +----------+ plot_validation ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.plot_validation` .. code:: python # Plot results on the validation set hgb.plot_validation() .. |figS6| image:: ../figs/plot_validation_clf_1.png .. table:: Results on the validation set. :align: center +----------+ | |figS6| | +----------+ .. |figS7| image:: ../figs/plot_validation_clf_2.png .. |figS8| image:: ../figs/plot_validation_clf_3.png .. table:: Results on the validation set. :align: center +----------+----------+ | |figS7| | |figS8| | +----------+----------+ plot_cv ------------------- Function documentation can be found here :func:`hgboost.hgboost.hgboost.plot_cv` .. code:: python # Plot results on the cross-validation hgb.plot_cv() .. |figS9| image:: ../figs/plot_cv_clf.png .. table:: results on the cross-validation. :align: center +----------+ | |figS9| | +----------+ .. include:: add_bottom.add