Classification

The hgboost method consists 3 classification methods: xgboost, catboost, lightboost. Each algorithm provides hyperparameters that must very likely be tuned for a specific dataset. Although there are many hyperparameters to tune, some are more important the others. The parameters used in hgboost are lised below:

Parameters
  • The number of trees or estimators.

  • The learning rate.

  • The row and column sampling rate for stochastic models.

  • The maximum tree depth.

  • The minimum tree weight.

  • The regularization terms alpha and lambda.

xgboost

The specific list of parameters used for xgboost: hgboost.hgboost.hgboost.xgboost()

# Parameters:
'learning_rate'     : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.5), base = 10, num = 1000))
'max_depth'         : hp.choice('max_depth', range(5, 32, 1))
'min_child_weight'  : hp.quniform('min_child_weight', 1, 10, 1)
'gamma'             : hp.choice('gamma', [0.5, 1, 1.5, 2, 3, 4, 5])
'subsample'         : hp.quniform('subsample', 0.1, 1, 0.01)
'n_estimators'      : hp.choice('n_estimators', range(20, 205, 5))
'colsample_bytree'  : hp.quniform('colsample_bytree', 0.1, 1.0, 0.01)
'scale_pos_weight'  : np.arange(0, 0.5, 1)
'booster'           : 'gbtree'
'early_stopping_rounds' : 25

# In case of two-class classification
objective = 'binary:logistic'
# In case of multi-class classification
objective = 'multi:softprob'

catboost

The specific list of parameters used for catboost: hgboost.hgboost.hgboost.catboost()

'learning_rate'     : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.31), base = 10, num = 1000))
'depth'             : hp.choice('max_depth', np.arange(2, 16, 1, dtype=int))
'iterations'        : hp.choice('iterations', np.arange(100, 1000, 100))
'l2_leaf_reg'       : hp.choice('l2_leaf_reg', np.arange(1, 100, 2))
'border_count'      : hp.choice('border_count', np.arange(5, 200, 1))
'thread_count'      : 4
'early_stopping_rounds' : 25

lightboost

The specific list of parameters used for lightboost: hgboost.hgboost.hgboost.lightboost()

# Parameters:
'learning_rate'     : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.5), base = 10, num = 1000))
'max_depth'         : hp.choice('max_depth', np.arange(5, 75, 1))
'boosting_type'     : hp.choice('boosting_type', ['gbdt','goss','dart'])
'num_leaves'        : hp.choice('num_leaves', np.arange(100, 1000, 100))
'n_estimators'      : hp.choice('n_estimators', np.arange(20, 205, 5))
'subsample_for_bin' : hp.choice('subsample_for_bin', np.arange(20000, 300000, 20000))
'min_child_samples' : hp.choice('min_child_weight', np.arange(20, 500, 5))
'reg_alpha'         : hp.quniform('reg_alpha', 0, 1, 0.01)
'reg_lambda'        : hp.quniform('reg_lambda', 0, 1, 0.01)
'colsample_bytree'  : hp.quniform('colsample_bytree', 0.6, 1, 0.01)
'subsample'         : hp.quniform('subsample', 0.5, 1, 100)
'bagging_fraction'  : hp.choice('bagging_fraction', np.arange(0.2, 1, 0.2))
'is_unbalance'      : hp.choice('is_unbalance', [True, False])
'early_stopping_rounds' : 25