Algorithm described in paper/blog and implemented in outlines.
For a sequence of tokens
where we can perform multinomial sampling from the probability distribution of output logits.
The idea of guided generation is to restrict support of the possibilities of the vocabulary. Forbidding sequences that do not respect the structure of the possible vocabulary.
naive approach
- apply a boolean mask
that restricts the support of the original distribution - renormalise output logits
- perform multinomial sampling
efficient approach (finite state machine)
Use a finite state machine tuple,
, finite set of states , finite alphabet , transition function , start state , set of accept states
Need to pre-process the vocabulary using the regular expression’s FSM and build an index (hash map for mask
- consider starting in every viable FSM state
- determine the valid next tokens if transition function produces a viable state.