Implement Rag with Spring AI and Qdrant DB

Earlier, we discussed Spring AI's integration with Qdrant DB. Continuing on the same lines, we'll explore and try implementing the Retrieval Augmented Generation (RAG) technique using Spring AI and Qdrant DB.

We'll develop a chatbot that helps users query PDF documents, in natural language.

RAG Technique

Several LLMs exist, including OpenAI's GPT and Meta's Llama series, all pre-trained on publicly available internet data. However, they can't be used directly in a private enterprise's context because of the access restrictions to its knowledge base.

Moreover, fine-tuning the LLMs is a time-consuming and resource-intensive process. Hence, augmenting the query or prompts with the information from the private knowledge base is the quickest and easiest way out.

The application converts the user query into vectors. Then, it fires the query vector into the Vector DB. The Vector DB performs a semantic search and returns the nearest response vectors to the application. This response helps provide the context forming the prompt for the LLM.

The application decodes them back to human-readable form. Further, it creates the prompt by combining the user query with the DB response and sends it to the LLM service. The LLM generates an answer, based strictly on the context provided by the Vector DB.

However, before all this, the application must persist the knowledge base documents in the Vector DB. An ETL process fetches the documents in batches from the knowledge base. Further, it splits the documents into smaller chunks.

Afterward, an embedding model transforms the chunks into embeddings. Finally, we save these embeddings into the Vector DBs like Milvus, Qdrant DB, etc.

RAG Support in Spring AI

Spring AI is an emerging framework in the world of AI. However, Langchain still rules and Spring AI doesn't appear to replace it soon.

Spring AI has a great framework that provides seamless configurable integration with the underlying LLM services and Vector DBs. With this framework, we can choose and replace the underlying LLM services and Vector DBs with the least modification to the source code.

Let's explore more, on this topic in the upcoming sections.

ETL Components

Now, that we've seen what it takes to build an application using RAG, we can explore a few important Spring AI components along the same lines. Let's begin with components that can help in creating an ETL application:

Spring AI uses functional interfaces like Consumer and Supplier. The VectorStore interface helps perform operations on the underlying Vector DB. Like QdrantVectorStore, several implementations of the VectorStore interface support other Vector DBs, such as Chroma, Milvus, Redis, etc. We can configure a VectorStore bean programmatically and with configuration properties.

Similarly, implementations of the DocumentReader interface help read PDFs, texts, HTML, JSON documents, etc., and split them into chunks. ParagraphPdfDocumentReader is one such implementation.

Finally, the VectorStore#add() method inserts the document embeddings into the underlying Vector DB. Later, the application uses VectorStore to perform a similarity search for the use query.

Query Components

Once the data is ready in the Vector DB, applications like Chatbots can query them. Luckily, Spring AI has components that can help build such applications:

The application first performs the semantic search utilizing the VectorStore interface. The PromptTemplate class helps define a blueprint with placeholders for the contextual information. It creates the Prompt object with the response from the Vector DB. However, we'll use the PromptTemplate#render() method which generates the prompt in string format.

Finally, the appropriate subclass of ChatModel like OpenAIChatModel uses the prompt to fire a query to the underlying LLM service.

We'll see all these classes in action, in the next section.

Chatbot Implementation with Spring AI

In the previous sections, we discussed the important components of Spring AI for developing a chatbot using the RAG technique. Now, let's use them to implement a chatbot that would answer the user queries by referring to the data in a PDF document.

Prerequisites

We'll use the Qdrant cloud instance and OpenAI LLM services, so the application must have the necessary Spring AI libraries. We've covered this in detail in our previous article, Qdrant DB - Spring AI Integration.

We'll require some Spring AI's configuration properties related to the Qdrant DB and Open AI's LLM service:

spring.ai.vectorstore.qdrant.host=3c4b4304-xxxx-cloud.qdrant.io
spring.ai.vectorstore.qdrant.port=6334
spring.ai.vectorstore.qdrant.api-key=xxxxx
spring.ai.vectorstore.qdrant.collection-name=books
spring.ai.vectorstore.qdrant.use-tls=true

spring.ai.openai.api-key=xxxxx
spring.ai.openai.chat.options.model=gpt-3.5-turbo
spring.ai.openai.chat.options.temperature=0.3
spring.ai.openai.embedding.options.model=text-embedding-ada-002

In our sample source code, we've defined all these configurations in the application-rag.properties file.

These properties help auto-configure the VectorStore and the ChatModel beans. Eventually, we can autowire these beans in the Spring component classes.

ETL Implementation

Let's first go through the program that reads and then converts a PDF file on meditation into multiple documents:

public class RagTechniqueLiveTest {
    @Value("classpath:/rag/meditation_and_its_methods.pdf")
    private String filePath;

    private List<Document> documentChunks;

    @Test
    void whenReadPdf_thenGenerateDocumentChunks() {
        ParagraphPdfDocumentReader paragraphPdfDocumentReader
          = new ParagraphPdfDocumentReader(filePath);
        TokenTextSplitter tokenTextSplitter = new TokenTextSplitter();
        List<Document> documents = paragraphPdfDocumentReader.get();
        documentChunks = tokenTextSplitter.apply(documents);
        assertTrue(documentChunks.size() > 1);     
    }
}

We've used the ParagraphPdfDocumentReader class to read the documents from the PDF file. The class is useful in retrieving documents from PDFs containing catalogs. It returns a Document object for each paragraph in the PDF file. Then, the TokenTextSplitter#apply() method splits the documents into smaller chunks. Additionally, we can configure the TokenTextSplitter class to control the chunk size.

Finally, we'll load the document chunks into the Qdrant DB by calling the VectoreStore#add() method:

void whenDocumentChunksGenerated_thenSaveInVectorDB() {
    assertInstanceOf(QdrantVectorStore.class, vectorStore);
    assertDoesNotThrow(() -> vectorStore.add(documentChunks));
}

The VectorStore bean is autowired into the Spring class.

Let's define the prompt template for the LLM in the user_query_prompt.st file:

Answer the user query strictly from the following context:
{context}
Query:
{query}
If the context is not providing any answers to the query then simply respond:
Sorry, I am unable to answer your query.

The prompt template has placeholders for the context and the query.

Now, let's take a look at a program that replaces these placeholders:

void whenQueryDocument_thenRespondWithInContext(String query) {   
    PromptTemplate promptTemplate = new PromptTemplate(systemResource);
    promptTemplate.add("context", vectorStore.similaritySearch(query));
    promptTemplate.add("query", query);
    String prompt = promptTemplate.render();
    assertInstanceOf(OpenAiChatModel.class, chatModel);
    String chatResponse = chatModel.call(prompt);
    logger.info("Chat bot response: {}", chatResponse);
}

The VectorStore#similaritySearch() method helps fetch results for user queries from the Qdrant DB. Then, we replace the context placeholder in the prompt template with the result. Furthermore, we replace the query placeholder with the query. Finally, we pass the prompt to the ChatModel#call() method and it makes a call to the OpenAI LLM service.

Let's see the result when the query is aligned with the subject covered in the book. For instance, when the query is What are the benefits of meditation?:

Chat bot response: Meditation helps in controlling anger and hatred, and with continuous practice, it can become a habit that will help in controlling and checking negative emotions.

Further, let's check out the result when we ask a query not covered in the book. For example, What are the benefits of Tibetan meditation?:

Chat bot response: Sorry, I am unable to answer your query.

As expected, the LLM responded to the queries, following the instructions in the prompt. Hence, when asked an out-of-context query, it could not answer.

Conclusion

In this article, we've learned about the RAG technique and the supporting components in the Spring AI framework.

However, many additional aspects must be explored to build a full-fledged, scalable, and reliable chatbot. Like the right LLM configurations to get appropriate responses, for the requirements and existing constraints. Moreover, understanding the underlying Vector DB is equally important.

Visit our GitHub repository to access the article's source code.

Kode Sastra

Search this blog