Training LLMs in a self-supervised manner (i.e. using only the raw text itself) is known as pre-training. This makes it easy to scale train data by just finding more text rather than labelling data.

Pre-trained models are also called foundation models.

Pre-trained models:

  • do not follow instructions well
  • quite readily emit toxic content

Pre-trained models can be tailored to a specific tasks using llm fine-tuning.