Gaussian processes (GPs) scale as , where is the number of data points, which is problematic for large datasets.

By parametrising the GP as a stochastic differential equation, we can reformulate the GP regression problem into a linear gaussian ssm, where it can be solved using kalman filtering and smoothing with linear time complexity ().

The GP kernel defines a continuous-time linear SDE (a special case of the continuous-discrete state space model). The SDE is then exactly discretised via matrix exponentials to obtain the discrete-time transition matrices and noise covariance . See also this paper on discretisation of continuous-time linear systems.

See the Temporal Gaussian Process Regression in Logarithmic Time and Kalman filtering and smoothing solutions to temporal Gaussian process regression modelspapers, as well as Adrien Corenflos’ implementation, the BayesNewton implementation, the GPy implementation and the RxInfer.jl example. Perhaps could be implemented in dynamax by wrapping the filter step within a larger log-likelihood to optimise the lengthscale and variance, but no success so far.

Mátern kernels

Each Mátern kernel () corresponds to a linear SDE (a continuous-discrete state space model) with state dimension . We need to train lengthscale (), variance (), and likelihood noise ().

Mátern-5/2 (, )

A typical choice for spatial and temporal problems. The continuous-time SDE is:

The Brownian motion is scalar (), entering only the third state dimension. The observation model picks out the first component of the state:

Mátern-1/2 (, )

A scalar SDE (state dimension 1):

with observation , .

discretisation to LGSSM

Each SDE is exactly discretised to a linear gaussian ssm with step size :

where:

For Mátern-1/2 this simplifies to and . For higher-order Mátern, and are dense matrices computed numerically. Inference then uses kalman filtering and smoothing.