Skip to main content

Spring AI Image Model API

Android Aghori generated through ChatGpt
Generated by AI

Large Language Models (LLMs) such as ChatGPT-4, Claude 2, Llama 2, etc., generate text outputs when invoked with text prompts.

However, Image Models such as DALL-E, MidJourney, and Stable Diffusion can generate images from user prompts, which can be either texts or images. All Model providers have their own APIs, and switching between them becomes challenging without a common interface.

Luckily, the Spring AI library offers a framework that can help seamlessly integrate with the underlying models.

In this article, we'll learn some important components of Spring AI's Image Model API and implement a demo program to generate an image with a prompt.

Important Components of Image Model API

First, let's look at the important components of the Image Model API that help integrate with the underlying LLM providers such as OpenAI, and Stability AI:

Key Classes of Spring AI Image Model API

Interfaces such as ImageOptions and ImageModel, as well as classes such as ImageResponse and ImageGeneration, abstract away the underlying AI models. Further, ImageOptions helps define image attributes such as height and width. Later, ImagePrompt, with user instructions and ImageOptions, is passed as an argument to the ImageModel#call() method.

Finally, the call() method of the implementation classes of ImageModel such as OpenAiImageModel and StabilityAiImageModel invokes the underlying Image Model service of providers such as OpenAI and Stability AI.

Prerequisites

To showcase the user of this library, we'll use OpenAI's DALL-E Model. Therefore first, we'll sign up for OpenAI's API key. Next, we'll import the necessary Maven dependencies to integrate with the OpenAI service in our Spring Boot application. For this purpose, we can always use the handy online Spring Initializr tool:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>

In the next section, we'll implement a program using the Image Model API and generate an image with a prompt.

Program Using Image Model API

First, we'll define the Spring AI configurations in the application-im.properties file. The framework uses them to autoconfigure the ImageModel bean:

spring.ai.openai.api-key=sk-proj-xxxx
spring.ai.openai.image.api-key=sk-proj-xxxx
spring.ai.openai.image.options.model=dall-e-3
spring.ai.openai.image.options.response_format=url
spring.ai.openai.image.options.style=vivid

The Image Model API configuration properties for OpenAI are defined under the spring.ai.openai.image namespace. These configurations help control image attributes such as size, style, and quality. DALL-E-3 is the default model. Additionally, we can receive the image in base64 JSON format or its URL.

Next, let's define a service class for invoking the OpenAI Image Model API:

@Service
public class OpenAiImageGenService {
    private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenService.class);

    @Autowired
    private ImageModel openAIImageModel;

    public String generateImage(String instructions) {
        ImagePrompt prompt = new ImagePrompt(instructions);
        ImageResponse imageResponse = openAIImageModel.call(prompt);
        return imageResponse.getResult().getOutput().getUrl();
    }
}

The Spring framework injects the autoconfigured ImageModel bean into the OpenAiImageGenService class. The generateImage() method takes the instruction argument and instantiates the ImagePrompt object. Next, we pass the prompt to OpenAIImageModel#call() and finally get the image URL.

However, what if we want to set up the image attributes during the runtime? This is possible, let's look at an overloaded version of the generateImage() method in the same class:

public String generateImage(String instruction, int height, int width, String model) {
    ImageOptions imageOptions = ImageOptionsBuilder.builder()
      .height(height)
      .width(width)
      .model(model)
      .build();
    ImagePrompt prompt = new ImagePrompt(instruction, imageOptions);
    ImageResponse imageResponse = openAIImageModel.call(prompt);
    return imageResponse.getResult().getOutput().getUrl();
}

We've added three extra arguments: height, width, and model. Using these additional arguments, we create the ImageOptions objects with the help of the ImageOptionsBuilder class. Then, we instantiate the ImagePrompt using the ImageOptions object and the instruction. Finally, we invoke ImageModel#call() to retrieve the image URL from the OpenAI's Image Model service.

Execute Program and Verify

Further, let's run the first method with the help of @SpringBootTest:

@SpringBootTest
@ActiveProfiles("im")
public class OpenAiImageGenLiveTest {
    private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenLiveTest.class);

    @Autowired
    private OpenAiImageGenService openAIImageGenService;

    @Test
    void whenInvokeOpenAIImageModel_thenReturnImageUrl() {
        String instruction = "A humming bird fluttering above a small girl's head";
        String imageUrl = openAIImageGenService.generateImage(instruction);

        logger.info("imageUrl: {}", imageUrl);
    }
}

Output:

[2025-02-15 20:05:29] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-P9cSyx6ZqooE1lwlZEKILIdr.png?st=2025-02-15T13%3A35%3A29Z&se=2025-02-15T15%3A35%3A29Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T12%3A44%3A50Z&ske=2025-02-16T12%3A44%3A50Z&sks=b&skv=2024-08-04&sig=lsO/oZWwAXKAWyOF/keZIVON7QkBvuq9SxJztx35kzI%3D

Now, we can verify the image by accessing the URL on the browser:

AI generated image

Let's run the overloaded generateImage() method:

void whenInvokeOpenAIImageModelWithImageAttributes_thenReturnImageUrl() {
    String instruction = "A humming bird fluttering above a small girl's head";
    String imageUrl = openAIImageGenService.generateImage(instruction, 256, 
      256, "dall-e-2");
    logger.info("imageUrl: {}", imageUrl);
}

Unlike the default option, we've used the DALL-E-2 model and reduced the height and width to 256px. However, while setting the image options we should ensure that the underlying Image Model supports them.

Output:

[2025-02-15 22:20:52] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-LrEpeHLtXDY29otxg7VTnEjh.png?st=2025-02-15T15%3A50%3A52Z&se=2025-02-15T17%3A50%3A52Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T06%3A39%3A02Z&ske=2025-02-16T06%3A39%3A02Z&sks=b&skv=2024-08-04&sig=pA8p9FKY3LH8Dza0ydBJI6CljFGkXmD6Z/PYtAO6Pcs%3D

Once again, we'll verify the generated image by accessing it over the browser:

AI generated image

Unlike the default case, the image style and size are completely different.

Conclusion

Spring AI's Image Model API is easy to use and abstracts the complexity of the underlying provider-specific service. Its standard interfaces help users easily switch between the various LLM service providers.

Nonetheless, the library is fairly new, and some features, such as image editing APIs from the foundational model providers, have not yet been integrated.

Visit our GitHub repository to access the article's source code.

Comments

Popular posts from Kode Sastra

Qdrant DB - Spring AI Integration

Designed by Freepik This tutorial covers Spring AI's integration with Qdrant DB . It's an open-source, efficient, and scalable vector database. We'll insert some unstructured data into the vector DB. Then, we'll perform query and delete operations on the DB using the Spring AI framework. Brief Introduction to Qdrant DB It's a highly scalable multi-dimensional vector database with multiple flexible deployment options: Qdrant Cloud offers 100% managed SaaS on AWS, Azure, and GCP and a hybrid cloud variant on the Kubernetes cluster. It provides a unified console, to help create, manage, and monitor multi-node Qdrant DB clusters. It also supports on-premise private cloud deployments. This is for customers who want more control over management and data. Moreover, IAC tools like Terraform and Pulumi enable automated deployment and managemen...

Implement Rag with Spring AI and Qdrant DB

Designed by Freepik Earlier, we discussed Spring AI's integration with Qdrant DB . Continuing on the same lines, we'll explore and try implementing the Retrieval Augmented Generation (RAG) technique using Spring AI and Qdrant DB. We'll develop a chatbot that helps users query PDF documents, in natural language . RAG Technique Several LLMs exist, including OpenAI's GPT and Meta's Llama series, all pre-trained on publicly available internet data. However, they can't be used directly in a private enterprise's context because of the access restrictions to its knowledge base. Moreover, fine-tuning the LLMs is a time-consuming and resource-intensive process. Hence, augmenting the query or prompts with the information from the private knowledge base is the quickest and easiest way out . The application converts the user query into vectors. Then, it fires the q...

Building AI Assistance Using Spring AI's Function Calling API

Photo by Alex Knight on Unsplash Building AI assistance in existing legacy applications is gaining a lot of momentum. An AI assistant like a chatbot can provide users with a unified experience and enable them to perform functionalities across multiple modules through a single interface. In our article, we'll see how to leverage Spring AI to build an AI assistant. We'll demonstrate how to seamlessly reuse existing application services and functions alongside LLM capabilities. Function Calling Concept An LLM can respond to an application request in multiple ways: LLM responds from its training data LLM looks for the information provided in the prompt to respond to the query LLM has a callback function information in the prompt, that can help get the response Let's try to understand the third option, Spring AI's Function calling ...