Model selection

MKL algorithms may have different hyper-parameters that need to be tuned. This section introduces a few tools to select hyper-parameters and to simplify the validation process.

Train/test split

If we just want to divide our kernels list KL (and our labels vector Y) into training and test lists we can use the MKLpy.model_selection.train_test_split method

from MKLpy.model_selection import train_test_split
KL = [...]  #kernels list
Y = ...     #labels 
KLtr, KLte, Ytr, Yte = train_test_split(KL, Y, random_state=42, shuffle=True, test_size=.3)

Warning

MKLpy.model_selection.train_test_split provides a simple way to split a kernels list into a training and validation/test lists. However, this approach is not efficient. If you need to reduce the time required for kernels computation, consider to directly build a training and a validation/test lists instead of using this method.

Cross validation

However, the previous approach is really limited, and it can be used only for simple experimentations.

If we need something more complex, MKLpy provides simple routines to perform a cross-validation. In the following example, we perform a 3-fold cross-validation with EasyMKL (with default hyper-parameters)

from MKLpy.model_selection import cross_val_score
from MKLpy.algorithms import EasyMKL
mkl = EasyMKL()
scores = cross_val_score(KL, Y, mkl, n_folds=3, scoring='accuracy')
print (scores)  #accuracy for each fold

Finally, we can leverage the cross-validation to find the best hyper-parameters configuration. In the following example, we use a grid-search to select the best $\lambda$ for EasyMKL and $C$ for the base SVM.

from MKLpy.model_selection import cross_val_score
from MKLpy.algorithms import EasyMKL
from sklearn.svm import SVC
from itertools import product

KL, Y = ..., ...

lam_values = [0, 0.1, 0.2, 1]
C_values   = [0.01, 1, 100]
for lam, C in product(lam_values, C_values):
    svm = SVC(C=C)
    mkl = EasyMKL(lam=lam, learner=svm)
    scores = cross_val_score(KL, Y, mkl, n_folds=3, scoring='roc_auc')
    print (lam, C, scores)

The scoring mechanisms currently available are accuracy, roc_auc, and f_score.

Playing with folds

If you need to run a simple cross-validation, you can just specify an integer value with the n_folds parameter. Otherwise, if you need more control on the validation process, you can pass a splitter

from sklearn.model_selection import StratifiedKFold, LeaveOneOut

loo = LeaveOneOut()
mkl = AverageMKL()
scores = cross_val_score(KL, Y, mkl, n_folds=loo, scoring='accuracy')

See

Scikit-learn provides several splitters here. Choose the most appropriate.