Gaussian processes (GPs) scale as , where is the number of data points, which is problematic for large datasets.

By parametrising the GP as a stochastic differential equation, we can reformulate the GP regression problem into a linear gaussian ssm, where it can be solved using kalman filtering and smoothing with linear time complexity ().

See the Temporal Gaussian Process Regression in Logarithmic Time and Kalman filtering and smoothing solutions to temporal Gaussian process regression modelspapers, as well as Adrien Corenflos’ implementation, the BayesNewton implementationand the GPy implementation. Perhaps could be implemented in dynamax by wrapping the filter step within a larger log-likelihood to optimise the lengthscale and variance, but no success so far.

Mátern-5/2 kernel

This is the case , so , a typical choice for spatial and temporal problems.

We need to train lengthscale () and variance () of kernel during fitting, as well as the likelihood noise .

where

and is some white noise process.

The observation model is

where .

This is a linear gaussian ssm.

Matérn-1/2 kernel

Matérn-1/2 is a less common covariance function, but it results in even simpler matrices:

with the observation model

where .