Automatic differentiation (autodiff) is built on two transformations: Jacobian-vector products (JVPs) and vector-Jacobian products (VJPs).
We can see this as the last layer before the gradient of the scalar-valued loss is
which is a vector-Jacobian product.
These also have the property that for composition of functions, we can compose their JVP/VJPs.
jacobian-vector product
- Right hand side is first two terms of Taylor series of , thus explaining what happens with a nudge from vector .
- Evaluates Jacobian at each column
jax.jvp
vector-jacobian product
- Evaluates Jacobian at each row
jax.vjp