Algorithm described in paper/blog and implemented in outlines.
For a sequence of tokens , predict the next token
where we can perform multinomial sampling from the probability distribution of output logits.
The idea of guided generation is to restrict support of the possibilities of the vocabulary. Forbidding sequences that do not respect the structure of the possible vocabulary.
naive approach
- apply a boolean mask that restricts the support of the original distribution
- renormalise output logits
- perform multinomial sampling
cost of deciding whether each possible token is in restricted vocabulary at each step.
efficient approach (finite state machine)
Use a finite state machine tuple, , to efficiently continue the state machine without reading from the beginning of the growing sample sequence each time.
- , finite set of states
- , finite alphabet
- , transition function
- , start state
- , set of accept states
Need to pre-process the vocabulary using the regular expression’s FSM and build an index (hash map for mask with ) to find subsequences of FSM that accept each string:
- consider starting in every viable FSM state
- determine the valid next tokens if transition function produces a viable state.