Examples two-class model

In this example we are going to learn a model, and use the output y_true, y_proba and y_pred in classeval for the evaluation of the model.

# Import library
import classeval as clf

# Load example dataset
X, y = clf.load_example('breast')
X_train, X_test, y_train, y_true = train_test_split(X, y, test_size=0.2)

# Train model
model = gb.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:,1]
y_pred = model.predict(X_test)

Now we can evaluate the model by:

# Evaluate
out = clf.eval(y_true, y_proba, pos_label='malignant')

Print some results to screen:

# Print AUC score
print(out['auc'])

# Print f1-score
print(out['f1'])

# Show some results
print(out['report'])
#
#                 precision    recall  f1-score   support
#
#          False       0.96      0.96      0.96        70
#           True       0.93      0.93      0.93        44
#
#       accuracy                           0.95       114
#      macro avg       0.94      0.94      0.94       114
#   weighted avg       0.95      0.95      0.95       114

Plot by using classeval.classeval.plot():

Four subplots are created:
  • top left: ROC curve

  • top right: CAP curve

  • bottom left: AP curve

  • bottom right: Probability curve

# Make plot
ax = clf.plot(out, figsize=(20,15), fontsize=14)
_images/example1_fig1.png

Class distribution in a bargraph

_images/example1_fig3.png

ROC in two-class

Plot ROC using:

# Compute ROC
out_ROC = clf.ROC.eval(y_true, y_proba, pos_label='malignant')

# Make plot
ax = clf.ROC.plot(out_ROC, title='Breast dataset')

Confmatrix in two-class

It is also possible to plot only the confusion matrix:

# Compute confmatrix
out_CONFMAT = clf.confmatrix.eval(y_true, y_pred, normalize=True)

# Make plot
clf.confmatrix.plot(out_CONFMAT, fontsize=18)
_images/example1_fig2.png

Examples multi-class model

In this example we are going to learn a multi-class model, and use the output y_true, y_proba and y_pred in classeval for the evaluation of the model.

# Import library
import classeval as clf

# Load example dataset
X,y = clf.load_example('iris')
X_train, X_test, y_train, y_true = train_test_split(X, y, test_size=0.5)

# Train model
model = gb.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)
y_score = model.decision_function(X_test)

Lets evaluate the model results:

out = clf.eval(y_true, y_proba, y_score, y_pred)

Plot by using classeval.classeval.plot()

# Make plot
ax = clf.plot(out)
_images/multiclass_fig1_1.png

Class distribution in a bargraph

_images/multiclass_fig1_3.png

ROC in multi-class

ROC uses the same function as for two-class.

# ROC evaluation
out_ROC = clf.ROC.eval(y_true, y_proba, y_score)
ax = clf.ROC.plot(out_ROC, title='Iris dataset')

Confmatrix in multi-class

Confmatrix uses the same function as for two-class.

# Confmatrix evaluation
out_CONFMAT = clf.confmatrix.eval(y_true, y_pred, normalize=False)
ax = clf.confmatrix.plot(out_CONFMAT)

Confusion matrix

_images/multiclass_fig1_2.png

Normalized confusion matrix

# Confusion matrix
out_CONFMAT = clf.confmatrix.eval(y_true, y_pred, normalize=True)
# Plot
ax = clf.confmatrix.plot(out_CONFMAT)
_images/multiclass_fig1_4.png

Model Performance tweaking

It can be desired to tweak the performance of the model and thereby adjust, for example the number of False postives. With classeval it is easy to determine the most desired model.

Lets start with a simple model.

# Load example dataset
    X, y = clf.load_example('breast')
    X_train, X_test, y_train, y_true = train_test_split(X, y, test_size=0.2)

# Fit model
    model = gb.fit(X_train, y_train)
    y_proba = model.predict_proba(X_test)[:,1]
    y_pred = model.predict(X_test)

The default threshold value is 0.5 and gives these results:

# Set threshold at 0.5 (default)
out = clf.eval(y_true, y_proba, pos_label='malignant', threshold=0.5)

# [[73  0]
# [ 1 40]]

# Make plot
_ = clf.TPFP(out['y_true'], out['y_proba'], threshold=0.2, showfig=True, )
_images/multiclass_threshold_05.png

Lets adjust the model by setting the threshold differently:

# Set threshold at 0.2
out = clf.eval(y_true, y_proba, pos_label='malignant', threshold=0.2)

# [[72  1]
# [ 0 41]]

# Make plot
_ = clf.TPFP(out['y_true'], out['y_proba'], threshold=0.2, showfig=True, )
_images/multiclass_threshold_02.png

Cross-validation

Below is depicted an example of plotting cross-validation using classeval.

# Import library
import classeval as clf

# Load example dataset
X, y = clf.load_example('breast')
# Create empty dict to store the results
out = {}

# 10-fold crossvalidation
for i in range(0,10):
    # Random train/test split
    X_train, X_test, y_train, y_true = train_test_split(X, y, test_size=0.2)
    # Train model and make predictions on test set
    model = gb.fit(X_train, y_train)
    y_proba = model.predict_proba(X_test)[:,1]
    y_pred = model.predict(X_test)
    # Evaluate model and store in each evalution
    name = 'cross '+str(i)
    out[name] = clf.eval(y_true, y_proba, y_pred=y_pred, pos_label='malignant')

# After running the cross-validation, the ROC/AUC can be plotted as following:
clf.plot_cross(out, title='crossvalidation')
_images/crossvalidation_evaluation.png