kernel trick

Support vector machines rely on taking the dot product between data points, . We can increase the complexity by transforming with a nonlinear mapping as . Nonlinear mapping can introduce higher (or even infinite) dimensional terms.

Instead, we define a similarity function (kernel) which implicitly defines the nonlinear feature map, .

This is just a function on the original coordinates rather than a dot product on the set of transformed coordinates.

As an example, choose a space with two features and a polynomial order 2 feature space.

# polynomial transformation gives more features than just (x[0], x[1])
phiX = [(x[0], x[1], x[0], x[1], x[0]**2, x[1]**2) for x in X]
 
# linear is SVC().fit(x, y)
# this is slow
SVC().fit(phiX, y)
 
# instead, use a kernel function 
# k(x, x') = (1 + x.x')^2
# which is a dot product for only the original (x[0], x[1])
SVC(kernel="poly").fit(X, y)

Although the polynomial order 2 is finite and computationally feasible, some kernels, such as the radial basis function kernel, would be infinite-dimensional.

notes

Explorer

kernel trick

Graph View

Backlinks