Hitchhiker’s Guide to RAG with ChatGPT API and LangChain | Towards Data Science
Build a simple Python RAG pipeline using your local files as context
Nowadays, LLMs can easily generate tons of words and responses based on general knowledge, but what happens when we need answers requiring accurate and specific knowledge? Solely generative models frequently struggle to provide answers on domain specific questions for a bunch of reasons; maybe the data they were trained on are now outdated, maybe what we are asking for is really specific and specialized, maybe we want responses that take into account personal or corporate data that just aren’t public… 🤷♀️ the list goes on.
So, how can we leverage generative AI while keeping our responses accurate, relevant, and down-to-earth? A good answer to this question is the Retrieval-Augmented Generation (RAG) framework. RAG is a framework that consists of two key components: retrieval and generation (duh!). Unlike solely generative models that are pre-trained on specific data, RAG incorporates an extra step of retrieval that allows us to push additional information into the model from an external source, such as a database or a document. To put it differently, a RAG pipeline allows for providing coherent and natural responses (provided by the generation step), which are also factually accurate and grounded in a knowledge base of our choice (provided by the retrieval step).
In this way, RAG can be an extremely valuable tool for applications where highly specialized data is needed, as for instance customer support, legal advice, or technical documentation. One typical example of a RAG application is customer support chatbots, answering customer issues based on a company’s database of support documents and FAQs. Another example would be complex software or technical products with extensive troubleshooting guides. One more example would be legal advice — a RAG model would access and retrieve custom data from law libraries, previous cases, or firm guidelines. The examples are really endless; however, in all these cases, the access to external, specific, and relevant to the context data enables the model to offer more precise and accurate responses.
So, in this post, I walk you through building a simple RAG pipeline in Python, utilizing ChatGPT API, LangChain, and FAISS.
From a more technical perspective, RAG is a technique used to enhance an LLM’s responses by injecting it with additional, domain-specific information. In essence, RAG allows for a model to also take into account additional external information — like a recipe book, a technical manual, or a company’s internal knowledge base — while forming its responses.
This is very important because it allows us to eliminate a bunch of problems inherent to LLMs, as for instance:
To make this work, the external documents are first processed into vector embeddings and stored in a vector database. Then, when we submit a prompt to the LLM, any relevant data is retrieved from the vector database and passed to the LLM along with our prompt. As a result, the response of the LLM is formed by considering both our prompt and any relevant information existing in the vector database in the background. Such a vector database can be hosted locally or in the cloud, using a service like Pinecone or Weaviate.
The first component for building a RAG pipeline is the LLM model that will generate the responses. This can be any LLM, like Gemini or Claude, but in this post, I will be using OpenAI’s ChatGPT models via their API platform. In order to use their API, we need to sign in and obtain an API key. We also need to make sure the respective Python libraries are installed.
The other major component of building a RAG is processing external data — generating embeddings from documents and storing them in a vector database. The most popular framework for performing such a task is LangChain. In particular, LangChain allows:
We can easily install the required LangChain libraries by:
In this post, I’ll be using LangChain together with FAISS, a local vector database developed by Facebook AI Research. FAISS is a very lightweight package, and is thus appropriate for building a simple/small RAG pipeline. It can be easily installed with:
So, in summary, I will use:
The file that I will be feeding into the RAG pipeline for this post is a text file with some facts about me. This text file is located in the folder ‘RAG files’.
Now we are all set up, and we can start by specifying our API key and initializing our model:
Then we can load the files we want to use for the RAG, generate the embeddings, and store them as a vector database as follows:
Finally, we can wrap everything in a simple executable Python file:
Notice how the system prompt is defined. Essentially, a system prompt is an instruction given to the LLM that sets the behavior, tone, or constraints of the assistant before the user interacts. For example, we could set the system prompt to make the LLM provide responses like talking to a 4-year-old or a rocket scientist — here we ask to provide responses only based on the external data we provided, the ‘Maria facts’
So, let’s see what we’ve cooked! 🍳
Firstly, I ask a question that is irrelevant to the provided external datasource, to make sure that the model only uses the provided datasource when forming the responses and not general knowledge.
… and then I asked some questions specifically from the file I provided…
✨✨✨✨
Apparently, this is a very simplistic example of a RAG setup — there’s much more to consider when implementing it in a real business environment, such as security concerns around how data is handled, or performance issues when dealing with a larger, more realistic knowledge corpus and increased token usage. Nonetheless, I believe OpenAI’s API is truly impressive and offers immense, untapped potential for building custom, context-specific AI applications.
Loved this post? Let’s be friends! Join me on
📰Substack 💌 Medium 💼LinkedIn ☕Buy me a coffee!
Written By
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Substack Medium LinkedIn Buy me a coffee!