Algorithms
The hgboost
method consists three methods, xgboost
, catboost
, lightboost
.
xgboost
Gradient boosting is also known as gradient tree boosting, stochastic gradient boosting (an extension), and gradient boosting machines, or GBM for short. Ensembles are constructed from decision tree models. Trees are added one at a time to the ensemble and fit to correct the prediction errors made by prior models. This is a type of ensemble machine learning model referred to as boosting. Models are fit using any arbitrary differentiable loss function and gradient descent optimization algorithm. This gives the technique its name, “gradient boosting,” as the loss gradient is minimized as the model is fit, much like a neural network [1].
catboost
CatBoost is a high-performance open source library for gradient boosting on decision trees. It is developed by Yandex researchers and engineers, and is used for search, recommendation systems, personal assistant, self-driving cars, weather prediction and many other tasks. It is in open-source and can be used by anyone. It is desribed that CatBoost provides great results with default parameter and will therefore have reduce time spent on parameter tuning [2]. Another advantage that is described is that CatBoost allows you to use non-numeric factors, instead of having to pre-process your data or spend time and effort turning it to numbers [2].
Although it is described that default settings would be OK-ish, I’m not so sure about that. In addition, finding the optimzal parameters is no issues using hgboost
.
lightboost
LightGBM is a gradient boosting framework that uses tree based learning algorithms. Many boosting tools use pre-sort-based algorithms for decision tree learning. It is a simple solution, but not easy to optimize. LightGBM uses histogram-based algorithms, which bucket continuous feature (attribute) values into discrete bins. This speeds up training and reduces memory usage. It is designed to be distributed and efficient, and is also described with many advantages, such as Fast training, high efficiency, Low memory usage, better accuracy, support of parallel and GPU learning, and capable of handling large-scale data [3].
hyperopt
The method hyperopt
is incorporated in this library hgboost
for automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.
The approach is to expose the underlying expression graph of how a performance metric (e.g. classification accuracy on validation examples) is computed from hyperparameters that
govern not only how individual processing steps are applied, but even which processing steps are included.
A hyperparameter optimization algorithm transforms this graph into a program for optimizing that performance metric [4].
- References