Once we have done llm pre-training to understand language (general), we use transfer learning to fine-tune the model to a specific task.

Loss is typically calculated over the entire output phrase.

We can make LLMs easier to interact with using human input:

  • Instruction (supervised) fine-tuning: the model is fine-tuned on thousands of instruction prompt-completion pairs that were human-labeled.
  • Reinforcement Learning with Human feedback: humans indicated what they prefer.

Updating the entire set of weights is computationally inefficient. Use parameter efficient fine tuning to update some of the weights:

  • Prompt tuning: add a new set of weights between the prompt and the model.
  • Low-rank adaptation (LORA): reduce the number of parameters that need to be trained using singular value decomposition.
    • Creating a new set of LORA parameters that are much smaller than the others, but are shaped in a way that can be easily combined without changing the input and output sizes of certain parts of the model