When the dynamics model and the observation model of a state space model are both Gaussian, that is we have a linear gaussian ssm, we can perform inference efficiently using Kalman filtering methods.

We perform filtering to predict one step and obtain . Ignoring the optional exogenous inputs, , the algorithm consists of two steps:

  • Time update step
  • Measurement step, getting the expected observation

The update for the term can also be written in Joseph form, which is more numerically stable because it guarantees the term remains positive-symmetric even in the presence of floating-point errors.

The algorithm scales as . The full derivation relies on the conjugate properties of Gaussians and is in both the Kevin Murphy and Durbin and Koopman books.

When all the data have arrived, we can perform offline smoothing and obtain . The algorithm requires two passes through the data, a forward-filter and backward-smoother, and so scales .

parallel (associative) Kalman filter

The standard Kalman filter is sequential – each step depends on the previous filtered mean and covariance, giving serial complexity. The parallel Kalman filter reformulates the predict + update step as an affine map on the filtered mean, enabling a parallel prefix scan in .

Reformulation as an affine map

Substituting the time update (predict) into the measurement update, the filtered mean at time is:

where is a dynamics shift and is an observation shift from the general linear gaussian ssm formulation. This has the form of an affine map on the previous filtered mean:

where:

The covariance update does not depend on the mean – it only depends on the model parameters and can be computed independently.

associative scan

The composition of two affine maps is itself an affine map:

This composition is associative (it is function composition), meaning:

Both sides produce the same .

To compute all filtered states, we need the prefix compositions:

The associative scan (parallel prefix scan) computes all of these in parallel steps by composing pairs in a binary tree:

where denotes the composed affine map from step to . At each level, independent compositions run in parallel. Associativity guarantees that any grouping produces the correct result, e.g. .

cuthbert implements this using jax.lax.associative_scan with parallel=True flag:

from cuthbert import filter
from cuthbert.gaussian import kalman
 
filter_obj = kalman.build_filter(get_init_params, get_dynamics_params, get_observation_params)
states = filter(filter_obj, model_inputs, parallel=True)

cuthbert has three functions that map to the maths above:

  • init_prepare: creates the initial element with ,
  • filter_prepare: encodes one time step’s parameters into the affine scan element via associative_params_single
  • filter_combine: composes two scan elements via the associative filtering_operator

In parallel mode, all elements are prepared independently via jax.vmap(filter_prepare), then composed via jax.lax.associative_scan(filter_combine, ...). In sequential mode, the same filter_prepare and filter_combine are called inside a jax.lax.scan loop.

square-root form

In practice, the scan element carries additional terms beyond to track the covariance in Cholesky (square-root) form and accumulate the log marginal likelihood. Working with Cholesky factors rather than full covariance matrices provides better numerical stability, particularly for long time series. The composition operator is more complex but remains associative.

Joseph form equivalence

The Joseph form is defined as:

Using the definition for the innovation covariance :

Using the definition for the Kalman Gain , we have , and so