Spring AI Image Model API

Android Aghori generated through ChatGpt

Generated by AI

Large Language Models (LLMs) such as ChatGPT-4, Claude 2, Llama 2, etc., generate text outputs when invoked with text prompts.

However, Image Models such as DALL-E, MidJourney, and Stable Diffusion can generate images from user prompts, which can be either texts or images. All Model providers have their own APIs, and switching between them becomes challenging without a common interface.

Luckily, the Spring AI library offers a framework that can help seamlessly integrate with the underlying models.

In this article, we'll learn some important components of Spring AI's Image Model API and implement a demo program to generate an image with a prompt.

Important Components of Image Model API

First, let's look at the important components of the Image Model API that help integrate with the underlying LLM providers such as OpenAI, and Stability AI:

Key Classes of Spring AI Image Model API

Interfaces such as ImageOptions and ImageModel, as well as classes such as ImageResponse and ImageGeneration, abstract away the underlying AI models. Further, ImageOptions helps define image attributes such as height and width. Later, ImagePrompt, with user instructions and ImageOptions, is passed as an argument to the ImageModel#call() method.

Finally, the call() method of the implementation classes of ImageModel such as OpenAiImageModel and StabilityAiImageModel invokes the underlying Image Model service of providers such as OpenAI and Stability AI.

Prerequisites

To showcase the user of this library, we'll use OpenAI's DALL-E Model. Therefore first, we'll sign up for OpenAI's API key. Next, we'll import the necessary Maven dependencies to integrate with the OpenAI service in our Spring Boot application. For this purpose, we can always use the handy online Spring Initializr tool:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>

In the next section, we'll implement a program using the Image Model API and generate an image with a prompt.

Program Using Image Model API

First, we'll define the Spring AI configurations in the application-im.properties file. The framework uses them to autoconfigure the ImageModel bean:

spring.ai.openai.api-key=sk-proj-xxxx
spring.ai.openai.image.api-key=sk-proj-xxxx
spring.ai.openai.image.options.model=dall-e-3
spring.ai.openai.image.options.response_format=url
spring.ai.openai.image.options.style=vivid

The Image Model API configuration properties for OpenAI are defined under the spring.ai.openai.image namespace. These configurations help control image attributes such as size, style, and quality. DALL-E-3 is the default model. Additionally, we can receive the image in base64 JSON format or its URL.

Next, let's define a service class for invoking the OpenAI Image Model API:

@Service
public class OpenAiImageGenService {
    private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenService.class);

    @Autowired
    private ImageModel openAIImageModel;

    public String generateImage(String instructions) {
        ImagePrompt prompt = new ImagePrompt(instructions);
        ImageResponse imageResponse = openAIImageModel.call(prompt);
        return imageResponse.getResult().getOutput().getUrl();
    }
}

The Spring framework injects the autoconfigured ImageModel bean into the OpenAiImageGenService class. The generateImage() method takes the instruction argument and instantiates the ImagePrompt object. Next, we pass the prompt to OpenAIImageModel#call() and finally get the image URL.

However, what if we want to set up the image attributes during the runtime? This is possible, let's look at an overloaded version of the generateImage() method in the same class:

public String generateImage(String instruction, int height, int width, String model) {
    ImageOptions imageOptions = ImageOptionsBuilder.builder()
      .height(height)
      .width(width)
      .model(model)
      .build();
    ImagePrompt prompt = new ImagePrompt(instruction, imageOptions);
    ImageResponse imageResponse = openAIImageModel.call(prompt);
    return imageResponse.getResult().getOutput().getUrl();
}

We've added three extra arguments: height, width, and model. Using these additional arguments, we create the ImageOptions objects with the help of the ImageOptionsBuilder class. Then, we instantiate the ImagePrompt using the ImageOptions object and the instruction. Finally, we invoke ImageModel#call() to retrieve the image URL from the OpenAI's Image Model service.

Execute Program and Verify

Further, let's run the first method with the help of @SpringBootTest:

@SpringBootTest
@ActiveProfiles("im")
public class OpenAiImageGenLiveTest {
    private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenLiveTest.class);

    @Autowired
    private OpenAiImageGenService openAIImageGenService;

    @Test
    void whenInvokeOpenAIImageModel_thenReturnImageUrl() {
        String instruction = "A humming bird fluttering above a small girl's head";
        String imageUrl = openAIImageGenService.generateImage(instruction);

        logger.info("imageUrl: {}", imageUrl);
    }
}

Output:

[2025-02-15 20:05:29] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-P9cSyx6ZqooE1lwlZEKILIdr.png?st=2025-02-15T13%3A35%3A29Z&se=2025-02-15T15%3A35%3A29Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T12%3A44%3A50Z&ske=2025-02-16T12%3A44%3A50Z&sks=b&skv=2024-08-04&sig=lsO/oZWwAXKAWyOF/keZIVON7QkBvuq9SxJztx35kzI%3D

Now, we can verify the image by accessing the URL on the browser:

Let's run the overloaded generateImage() method:

void whenInvokeOpenAIImageModelWithImageAttributes_thenReturnImageUrl() {
    String instruction = "A humming bird fluttering above a small girl's head";
    String imageUrl = openAIImageGenService.generateImage(instruction, 256, 
      256, "dall-e-2");
    logger.info("imageUrl: {}", imageUrl);
}

Unlike the default option, we've used the DALL-E-2 model and reduced the height and width to 256px. However, while setting the image options we should ensure that the underlying Image Model supports them.

Output:

[2025-02-15 22:20:52] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-LrEpeHLtXDY29otxg7VTnEjh.png?st=2025-02-15T15%3A50%3A52Z&se=2025-02-15T17%3A50%3A52Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T06%3A39%3A02Z&ske=2025-02-16T06%3A39%3A02Z&sks=b&skv=2024-08-04&sig=pA8p9FKY3LH8Dza0ydBJI6CljFGkXmD6Z/PYtAO6Pcs%3D

Once again, we'll verify the generated image by accessing it over the browser:

Unlike the default case, the image style and size are completely different.

Conclusion

Spring AI's Image Model API is easy to use and abstracts the complexity of the underlying provider-specific service. Its standard interfaces help users easily switch between the various LLM service providers.

Nonetheless, the library is fairly new, and some features, such as image editing APIs from the foundational model providers, have not yet been integrated.

Visit our GitHub repository to access the article's source code.

Implement Rag with Spring AI and Qdrant DB

Designed by Freepik Earlier, we discussed Spring AI's integration with Qdrant DB . Continuing on the same lines, we'll explore and try implementing the Retrieval Augmented Generation (RAG) technique using Spring AI and Qdrant DB. We'll develop a chatbot that helps users query PDF documents, in natural language . RAG Technique Several LLMs exist, including OpenAI's GPT and Meta's Llama series, all pre-trained on publicly available internet data. However, they can't be used directly in a private enterprise's context because of the access restrictions to its knowledge base. Moreover, fine-tuning the LLMs is a time-consuming and resource-intensive process. Hence, augmenting the query or prompts with the information from the private knowledge base is the quickest and easiest way out . The application converts the user query into vectors. Then, it fires the q...

Kode Sastra

Search this blog