When giving a prompt to a LLM such as
providing the entire context from the database is very costly. Instead, we can use RAG to provide only the most relevant context.
Set up the embeddings in a vector database. Then, embed the query (question
) and find the vector similarity (usually some sort of dot product) between the query and the embeddings to find the most relevant context
.
The search index finds approximate matches rather than an exact match in scalar indexing. See some different index strategies.
If we have manually set up the embeddings table, we can use SQLModel to do the retrieval step to get the k
most relevant documents.
We can also use LangChain to perform all RAG steps, taking advantage of the vector_db.as_retriever
functionality.