Classification
The hgboost
method consists 3 classification methods: xgboost
, catboost
, lightboost
.
Each algorithm provides hyperparameters that must very likely be tuned for a specific dataset.
Although there are many hyperparameters to tune, some are more important the others. The parameters used in hgboost
are lised below:
- Parameters
The number of trees or estimators.
The learning rate.
The row and column sampling rate for stochastic models.
The maximum tree depth.
The minimum tree weight.
The regularization terms alpha and lambda.
xgboost
The specific list of parameters used for xgboost: hgboost.hgboost.hgboost.xgboost()
# Parameters:
'learning_rate' : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.5), base = 10, num = 1000))
'max_depth' : hp.choice('max_depth', range(5, 32, 1))
'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1)
'gamma' : hp.choice('gamma', [0.5, 1, 1.5, 2, 3, 4, 5])
'subsample' : hp.quniform('subsample', 0.1, 1, 0.01)
'n_estimators' : hp.choice('n_estimators', range(20, 205, 5))
'colsample_bytree' : hp.quniform('colsample_bytree', 0.1, 1.0, 0.01)
'scale_pos_weight' : np.arange(0, 0.5, 1)
'booster' : 'gbtree'
'early_stopping_rounds' : 25
# In case of two-class classification
objective = 'binary:logistic'
# In case of multi-class classification
objective = 'multi:softprob'
catboost
The specific list of parameters used for catboost: hgboost.hgboost.hgboost.catboost()
'learning_rate' : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.31), base = 10, num = 1000))
'depth' : hp.choice('max_depth', np.arange(2, 16, 1, dtype=int))
'iterations' : hp.choice('iterations', np.arange(100, 1000, 100))
'l2_leaf_reg' : hp.choice('l2_leaf_reg', np.arange(1, 100, 2))
'border_count' : hp.choice('border_count', np.arange(5, 200, 1))
'thread_count' : 4
'early_stopping_rounds' : 25
lightboost
The specific list of parameters used for lightboost: hgboost.hgboost.hgboost.lightboost()
# Parameters:
'learning_rate' : hp.choice('learning_rate', np.logspace(np.log10(0.005), np.log10(0.5), base = 10, num = 1000))
'max_depth' : hp.choice('max_depth', np.arange(5, 75, 1))
'boosting_type' : hp.choice('boosting_type', ['gbdt','goss','dart'])
'num_leaves' : hp.choice('num_leaves', np.arange(100, 1000, 100))
'n_estimators' : hp.choice('n_estimators', np.arange(20, 205, 5))
'subsample_for_bin' : hp.choice('subsample_for_bin', np.arange(20000, 300000, 20000))
'min_child_samples' : hp.choice('min_child_weight', np.arange(20, 500, 5))
'reg_alpha' : hp.quniform('reg_alpha', 0, 1, 0.01)
'reg_lambda' : hp.quniform('reg_lambda', 0, 1, 0.01)
'colsample_bytree' : hp.quniform('colsample_bytree', 0.6, 1, 0.01)
'subsample' : hp.quniform('subsample', 0.5, 1, 100)
'bagging_fraction' : hp.choice('bagging_fraction', np.arange(0.2, 1, 0.2))
'is_unbalance' : hp.choice('is_unbalance', [True, False])
'early_stopping_rounds' : 25