Designed by Freepik |
This tutorial covers Spring AI's integration with Qdrant DB. It's an open-source, efficient, and scalable vector database.
We'll insert some unstructured data into the vector DB. Then, we'll perform query and delete operations on the DB using the Spring AI framework.
Brief Introduction to Qdrant DB
It's a highly scalable multi-dimensional vector database with multiple flexible deployment options:
Qdrant Cloud offers 100% managed SaaS on AWS, Azure, and GCP and a hybrid cloud variant on the Kubernetes cluster. It provides a unified console, to help create, manage, and monitor multi-node Qdrant DB clusters. It also supports on-premise private cloud deployments. This is for customers who want more control over management and data.
Moreover, IAC tools like Terraform and Pulumi enable automated deployment and management of the Qdrant DB on the cloud.
We must know a few concepts while dealing with data in Qdrant DB. Collections store vectors and their payloads. Where payload is the additional information stored with the vector, but it's optional. Furthermore, a point is a record of a vector in a collection. An example of a collection could be a collection of books.
Generally, an ETL pipeline splits a library of books into chunks and then converts them to vectors. Finally, these chunks are stored inside the collection in the vector form. Further, chatbots, recommendation engines, and anomaly detection applications perform semantic searches on collections using the Qdrant APIs.
In addition, vectors and payloads have indexes like their counterparts in relational databases to speed up the search or filter operations.
Prerequisites
For this tutorial, we'll need a Qdrant DB instance. We can install a local instance by deploying a Docker container, or use the Java TestContainer library.
But, to get started quickly, we'll use a single node free instance on Qdrant cloud and generate an API key to access it from our Spring AI application:
Next, we'll add the Qdrant starter Maven dependency in our Spring Boot application:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-qdrant-store-spring-boot-starter</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
The starter library will auto-configure the Spring VectoreStore class by reading connection properties from the Spring application.properties file, which helps integrate with Qdrant DB. The class performs operations like insert, delete, and similarity search on the Qdrant DB.
In scenarios, where an application has to explicitly set the properties programmatically, we depend upon a different library:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-qdrant-store</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
The application program has to convert the queries to embeddings or vectors before executing them. For this purpose, we'll use OpenAI's embedding model. Therefore, we'll include a Spring OpenAI LLM starter library:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
Moreover, we must sign up on the OpenAI platform to subscribe to a plan to acquire LLM service API keys.
Spring AI Key Concepts
Spring AI provides out-of-the-box integration with several vector DBs like Qdrant, PgVector, Milvus, Neo4j, etc. with the help of the common VectorStore interface. The Spring AI framework auto-configures the QdrantVectorStore class, an implementation of the VectorStore interface:The framework picks up the Qdrant DB connection properties defined in the application properties file and uses them to auto-configure the QdrantVectoreStore bean. It also supports manual configuration with the help of QdrantClient and QdrantGrpcClient, which are part of the Qdrant DB Java library. Eventually, QdrantClient is used in the QdrantVectoreStore constructor to create the VectorStore bean.
We'll discuss the approach with sample codes in the next section.
Instantiate VectorStore Bean
In this section, we'll explore the various ways to instantiate the QdrantVectorStore bean.
We've used two different Spring profiles, auto and manual. The auto profile has the properties required to auto-configure the VectorStore bean. Meanwhile, in the manual profile, we've removed the auto-configuring properties.
Auto Configuration
Once, the Spring AI framework discovers the Qdrant DB properties and the OpenAI-related properties, it autoconfigures the QdrantVectorStore bean:
spring.ai.vectorstore.qdrant.host=xxxxxx.europe-west3-0.gcp.cloud.qdrant.io
spring.ai.vectorstore.qdrant.port=6334
spring.ai.vectorstore.qdrant.api-key=xxxxxx
spring.ai.vectorstore.qdrant.collection-name=books
spring.ai.openai.api-key=xxxxxx
spring.ai.openai.embedding.options.model=text-embedding-3-small
In the application_auto.properties file, we've defined the Qdrant DB's host, port, API key, and collection name. These properties are declared under the namespace spring.ai.vectorstore. Additionally, it contains the OpenAI API key to auto-configure the EmbeddingModel bean. This helps convert the document chunks into vectors or embeddings.
Now, let's see how Spring AI, auto-configures the VectorStore bean:
class SpringQdrantLiveTest {
@Autowired
private VectorStore qdrantVectorStore;
@Test
void givenQdrantDB_whenPropertiesDefined_thenAutoConfigureVectorStore() {
assertNotNull(qdrantVectorStore);
assertInstanceOf(QdrantVectorStore.class, qdrantVectorStore);
}
}
As expected, when the program runs, the framework auto-wires the VectorStore bean into the class.
Manual Configuration
The API keys are exposed in the auto-configuration approach and it's difficult to externalize the properties. Let's address this by creating the VectorStore bean programmatically in the ApplicationConfiguration class:
@Bean
public QdrantClient qdrantClient() {
QdrantGrpcClient.Builder builder = QdrantGrpcClient.newBuilder(getQdrantHost(), getQdrantPort(), true);
builder.withApiKey(getQdrantAPIKey());
QdrantGrpcClient qdrantGrpcClient = builder.build();
return new QdrantClient(qdrantGrpcClient);
}
@Bean
public QdrantVectorStore qdrantVectorStore(EmbeddingModel embeddingModel, QdrantClient qdrantClient) {
return new QdrantVectorStore(qdrantClient, "books", embeddingModel, false);
}
Unlike the auto-configuration approach, we've used methods like getQdrantHost() and getQdrantPort() to retrieve the host and port information. We don't depend on the Spring Boot application properties file. Therefore, this approach enables easier customization.
Finally, we instantiate the QdrantVectorStore bean, in the qdrantVectorStore() method. The EmbeddingModel argument to the qdrantVectorStore() method is auto-configured with the help of the OpenAI key, declared in the application_manual.properties file:
qdrant.host=xxxxxx.europe-west3-0.gcp.cloud.qdrant.io
qdrant.port=6334
qdrant.api-key=xxxxxx
qdrant.collection-name=books
spring.ai.openai.api-key=xxxxxx
We've removed the spring.ai.vectorstore namespace from the property file to prevent auto-configuring the VectorStore bean. We've used these properties in the ApplicationConfiguration class. However, in the real world, applications fetch the connection properties from other sources like databases, vaults, etc.
Further, let's see if we can successfully inject the QdrantVectorStore bean:
public class SpringQdrantManualConfigLiveTest {
@Autowired
private VectorStore qdrantVectorStore;
@Test
void givenQdrantDB_whenDisabledAutoConfig_thenManuallyConfigureVectorStore() {
assertNotNull(qdrantVectorStore);
assertInstanceOf(QdrantVectorStore.class, qdrantVectorStore);
}
}
As expected, when the program runs, the framework auto-wires the VectorStore bean into the class.
Qdrant DB Operations
Now, that we've learned instantiating the VectorStore class, let's perform some DB operations.
Add Vectors
Storing vectors of documents into the Vector DB is crucial in developing applications that rely on similarity search.
Let's assume there's a books collection in the Qdrant DB and we'll insert a few documents into it using the VectorStore interface:
void givenQdrantDB_whenCallAddDocumentOfVectorStore_thenInsertSuccessfully()
throws ExecutionException, InterruptedException {
Document doc1 = new Document("Mindfulness meditation helps you focus on the present moment.",
Map.of("Author", "Shri Shri"));
Document doc2 = new Document("Transcendental meditation involves repeating a mantra.",
Map.of("Author", "Maharishi Yogi"));
Document doc3 = new Document("Loving-kindness meditation fosters compassion and kindness.",
Map.of("Book Name", "Sadguru"));
Document doc4 = new Document("Zen meditation emphasizes posture and breathing.",
Map.of("Book Name", "Katsuki Sekida"));
Document doc5 = new Document("If the mind goes for a spin, then life goes for a toss",
Map.of("Author", "Shri Shri"));
List<Document> documents = Arrays.asList(doc1, doc2, doc3, doc4, doc5);
assertEquals(0, qdrantClient.countAsync(collectionName).get().intValue());
qdrantVectorStore.add(documents);
assertTrue(qdrantClient.countAsync(collectionName).get().intValue() > 0);
}
The program defines a list of documents with hardcoded texts about meditation. It also inserts the book's author name as metadata. Nevertheless, Spring AI supports extracting and transforming various document types from different sources for real-world applications. Later, we call the qdrantVectorStore#add() method to persist them into the books collection inside the Qdrant DB.
Interestingly, the OpenAI's embedding model transforms the documents into vectors before the QdrantVectorStore class persists them in the DB. By default, the OpenAI's embedding model is text-embedding-ada-002. However, we can modify it with the spring.ai.openai.embedding.options.model property key in the application properties file.
Finally, the program calls the QdrantClient#countAsync() method to verify the number of vector points in the DB. Besides, we can also verify it directly on the Qdrant cloud web console:
Similarity Search
Generally, the query is in text format, but the embedding model converts it into a vector. Later, the program returns vectors that are closer to this query vector. Essentially, vectors with less distance between them are closely related.
Now, let's see how the Spring AI framework helps perform a similarity search:
void givenQdrantDB_whenCallSimilaritySearchOfVectorStore_thenSearchSuccessfully() {
List<Document> resultDocuments = qdrantVectorStore.similaritySearch(SearchRequest.defaults()
.withQuery("Mindfulness meditation")
.withTopK(2)
.withFilterExpression("Author == 'Shri Shri'")
.withSimilarityThreshold(0.5));
assertFalse(resultDocuments.isEmpty());
resultDocuments.forEach(e -> assertEquals("Shri Shri", e.getMetadata().get("Author")));
}
The program uses the VectorStore#similaritySearch() method to perform a semantic search for relevant portions of documents containing the text Mindful meditation. Additionally, we're converging on the vectors having metadata Author == 'Shri Shri' with the method SearchRequest#withFilterExpression(). Finally, a higher similarity threshold value helps return better relevant matches.
Delete Vectors
Lastly, we'll see how to delete the vectors from the DB:
void givenQdrantDB_whenCallDeleteOfVectorStore_thenDeleteSuccessfully() {
List<String> ids = getIds("Mindfulness meditation", "Author == 'Shri Shri'");
qdrantVectorStore.delete(ids);
assertEquals(0, getIds("Mindfulness meditation", "Author == 'Shri Shri'").size());
}
First, we're fetching the IDs of the vector by calling our custom getIds() method which essentially is doing a similarity search. Then, we pass the fetched IDs to the VectorStore#delete() method. As expected, the vectors get deleted from the Qdrant DB.
Conclusion
In this article, we discussed Spring AI integration with Qdrant DB. Spring AI offers a high-level abstraction over the various vector DBs and LLM services. We used interfaces such as VectorStore and EmbeddingModel as the main tools for integration with Qdrant DB.
However, it's equally important to understand the basic principles of how a vector DB operates. This would help in designing reliable and performant systems.
Visit our GitHub repository to access the article's source code.
Comments
Post a Comment