How to use ChatGPT with your own data (Pt 1): Retrieval Augmented Generation

This is the part 1 of 3-parts series on how to use ChatGPT with your own data.

Want ChatGPT to answer questions about your website and proprietary data?

This article will give you an overview of the architecture allowing ChatGPT to do so

ChatGPT limitations

Most people using ChatGPT or other LLM models like Claude 2 does not understand its limitations.

LLM has a short-term memory problem.
The longest context window of GPT-4 API is roughly 3000 words. The recently released Claud 2 triple that to 10,000 words but performance suffers toward the end
LLM are stuck in the past.
Here is a simplified process of how current generation LLMs are trained
- Companies prepare a large corpus of data
- They spend weeks or months training the LLM
- This is the reason why ChatGPT has a knowledge cut of on Sep 2019

Retrieval Augmented Generation or RAG for short is the answer.

RAG allows LLM to retrieve relevant information based on user query. The extra information augments the context of the LLM.

RAG solve both the problem we talked above. Let's see how it works at a high level

This stage involves

Prepare the data by splitting them into chunks. For example, splitting a pdf document into individual pages
We need to turn individual chunk into a format LLM can understand . This is embeddings. We can use OpenAI embedding API for this. You can learn more about it here
Store the output of the model in a vector database for future look up. There are multiple options for Vector database:
- Pinecone
- Milvus

When a user enter a query, we follow this process

The result found in previous stage become the context for the LLM to generate the response from.

There you have it, now you understand how RAG works at a high level. In the next part, I'll show a few easy ways to get start with RAG

To learn more about RAG, check out these resources