Training LLMs in a self-supervised manner (i.e. using only the raw text itself) is known as pre-training. This makes it easy to scale train data by just finding more text rather than labelling data.
Pre-trained models are also called foundation models.
Pre-trained models:
- do not follow instructions well
- quite readily emit toxic content
Pre-trained models can be tailored to a specific tasks using llm fine-tuning.