Efficient kernels generation

In the case of MKL algorithms, we may need to create list of kernels instead of a single one as the MKL algorithms require multiple kernels to perform the combination.

A list of multiple kernels can be easily computed by using list comprehensions. In the following example, we create a list of homogeneous polynomial kernels with degrees 1 up to 10

from MKLpy.metrics.pairwise import homogeneous_polynomial_kernel as hpk
KL_tr = [hpk(Xtr,     degree=d) for d in range(1,11)]
KL_te = [hpk(Xte,Xtr, degree=d) for d in range(1,11)]

This solution is really simple, but it leads to two big problems:

This specific computation is not efficient at all. Each homogeneous polynomial kernel $k(x,z) = \langle x,z\rangle^d$ requires two main operations, the dot product between examples and the explonentiation. It is easy to see that the exponentiation needs to be recomputed for each base kernels, but the result of the dot-product may be easily cached and reused for the whole list of kernels.
When dealing with large datasets, the available amount of memory may not be sufficient to handle the whole kernels list.

Note

Note that the caching is not doable for every possible kernels list

Generators

In order to alleviate these problems, we provide specific kernels generators providing an efficient computations and a lower memory consumption.

The same homogeneous polynomial kernels showed in the previous example may be computed with a generator:

from MKLpy.generators import HPK_generator
KLtr = HPK_generator(Xtr, degrees=range(1,11), cache=True)

Specifically, the generator performs a lazy computation, keeping a single kernel at a time in memory. The parameter cache allows a faster computation with the cost of a small amount of memory.

The generators currently available in MKLpy are:

Generator class	Scenario and use-case	Caching
Lambda_generator	generation of different heterogeneous kernels, for instance by mixing polynomial, rbf, and other custom functions	no
HPK_generator	generation of multiple homogeneous polynomial kernels with different degrees	yes
RBF_generator	generation of multiple rbf kernels with different gamma values	yes
Multiview_generator	generation of a specified kernel (e.g. linear) computed to each view of the input data (e.g. audio, video, or text). In this case, we have multiple feature vectors for the same example defining different views	no

Various examples are exposed in the following snippet

from MKLpy import generators
# 10 homogeneous poly. kernels 
KL_hpk = generators.HPK_generator(X, degrees=range(1,11))

# 3 rbf kernels
KL_rbf = generators.RBF_generator(X, gamma=[.001, .01, .1])

from MKLpy.metrics import pairwise
# 2 custom kernels (linear and polynomial)
ker_functions = [pairwise.linear, lambda X,Z : pairwise.polynomial(X,Z, degree=5)]
KL_mix = generators.Lambda_generator(X, ker_functions)

# 4 linear kernels from multiview
X1 = ...
X2 = ...    #each view consists of a specific group of features
X3 = ...
X4 = ...
KL_multi = Multiview_generator([X1, X2, X3, X4], kernel=pairwise.linear_kernel)

If you want to include an identity matrix when dealing with generators, you may simply set a parameter

KL_hpk = generators.HPK_generator(X, degrees=range(1,6), include_identity=True)

Benchmarking

In the Table below we show the time and memory consumption required by EasyMKL with 20 homogeneous polynomial kernels applied to two datasets, Madelon and Phishing.

Dataset	examples x features	list	generator w. cache	generator
Madelon	6000 x 5000	24.4[s], 7.3[GB]	28.0[s], 2.6[GB]	60.5[s], 2.3[GB]
Phishing	11055 x 68	115.7[s], 23.9[GB]	120.4[s], 6.2[GB]	126.8[s], 5.7[GB]

TL;DR Specifically, we show the resources usage when dealing with explicit lists of kernels and MKLpy generators. The values include the kernels computation and the algorithm training. What is striking from the table is the huge improvement in memory consumption (x3.2 and x4.2 reduction with Madelon and Phishing respectively), that is the main limitation of MKL algorithms.

Additionally, the benefits of the caching mechanism, that in the case of HPKs consists of a pre-computation of a linear kernel, are evident, especially in the case of Madelon dataset where the computation of the dot-product is particularly demanding.

Datasets

The datasets used in this experiment are freely available on
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

What if I do not care about memory usage?

explicit_kernels_list = my_generator.to_list()