![]() |
Generated by AI |
Large Language Models (LLMs) such as ChatGPT-4, Claude 2, Llama 2, etc., generate text outputs when invoked with text prompts.
However, Image Models such as DALL-E, MidJourney, and Stable Diffusion can generate images from user prompts, which can be either texts or images. All Model providers have their own APIs, and switching between them becomes challenging without a common interface.
Luckily, the Spring AI library offers a framework that can help seamlessly integrate with the underlying models.
In this article, we'll learn some important components of Spring AI's Image Model API and implement a demo program to generate an image with a prompt.
Important Components of Image Model API
First, let's look at the important components of the Image Model API that help integrate with the underlying LLM providers such as OpenAI, and Stability AI:
Interfaces such as ImageOptions and ImageModel, as well as classes such as ImageResponse and ImageGeneration, abstract away the underlying AI models. Further, ImageOptions helps define image attributes such as height and width. Later, ImagePrompt, with user instructions and ImageOptions, is passed as an argument to the ImageModel#call() method.
Finally, the call() method of the implementation classes of ImageModel such as OpenAiImageModel and StabilityAiImageModel invokes the underlying Image Model service of providers such as OpenAI and Stability AI.
Prerequisites
To showcase the user of this library, we'll use OpenAI's DALL-E Model. Therefore first, we'll sign up for OpenAI's API key. Next, we'll import the necessary Maven dependencies to integrate with the OpenAI service in our Spring Boot application. For this purpose, we can always use the handy online Spring Initializr tool:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
In the next section, we'll implement a program using the Image Model API and generate an image with a prompt.
Program Using Image Model API
First, we'll define the Spring AI configurations in the application-im.properties file. The framework uses them to autoconfigure the ImageModel bean:
spring.ai.openai.api-key=sk-proj-xxxx
spring.ai.openai.image.api-key=sk-proj-xxxx
spring.ai.openai.image.options.model=dall-e-3
spring.ai.openai.image.options.response_format=url
spring.ai.openai.image.options.style=vivid
The Image Model API configuration properties for OpenAI are defined under the spring.ai.openai.image namespace. These configurations help control image attributes such as size, style, and quality. DALL-E-3 is the default model. Additionally, we can receive the image in base64 JSON format or its URL.
Next, let's define a service class for invoking the OpenAI Image Model API:
@Service
public class OpenAiImageGenService {
private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenService.class);
@Autowired
private ImageModel openAIImageModel;
public String generateImage(String instructions) {
ImagePrompt prompt = new ImagePrompt(instructions);
ImageResponse imageResponse = openAIImageModel.call(prompt);
return imageResponse.getResult().getOutput().getUrl();
}
}
The Spring framework injects the autoconfigured ImageModel bean into the OpenAiImageGenService class. The generateImage() method takes the instruction argument and instantiates the ImagePrompt object. Next, we pass the prompt to OpenAIImageModel#call() and finally get the image URL.
However, what if we want to set up the image attributes during the runtime? This is possible, let's look at an overloaded version of the generateImage() method in the same class:
public String generateImage(String instruction, int height, int width, String model) {
ImageOptions imageOptions = ImageOptionsBuilder.builder()
.height(height)
.width(width)
.model(model)
.build();
ImagePrompt prompt = new ImagePrompt(instruction, imageOptions);
ImageResponse imageResponse = openAIImageModel.call(prompt);
return imageResponse.getResult().getOutput().getUrl();
}
We've added three extra arguments: height, width, and model. Using these additional arguments, we create the ImageOptions objects with the help of the ImageOptionsBuilder class. Then, we instantiate the ImagePrompt using the ImageOptions object and the instruction. Finally, we invoke ImageModel#call() to retrieve the image URL from the OpenAI's Image Model service.
Execute Program and Verify
Further, let's run the first method with the help of @SpringBootTest:
@SpringBootTest
@ActiveProfiles("im")
public class OpenAiImageGenLiveTest {
private final Logger logger = LoggerFactory.getLogger(OpenAiImageGenLiveTest.class);
@Autowired
private OpenAiImageGenService openAIImageGenService;
@Test
void whenInvokeOpenAIImageModel_thenReturnImageUrl() {
String instruction = "A humming bird fluttering above a small girl's head";
String imageUrl = openAIImageGenService.generateImage(instruction);
logger.info("imageUrl: {}", imageUrl);
}
}
Output:
[2025-02-15 20:05:29] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-P9cSyx6ZqooE1lwlZEKILIdr.png?st=2025-02-15T13%3A35%3A29Z&se=2025-02-15T15%3A35%3A29Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T12%3A44%3A50Z&ske=2025-02-16T12%3A44%3A50Z&sks=b&skv=2024-08-04&sig=lsO/oZWwAXKAWyOF/keZIVON7QkBvuq9SxJztx35kzI%3D
Now, we can verify the image by accessing the URL on the browser:
Let's run the overloaded generateImage() method:
void whenInvokeOpenAIImageModelWithImageAttributes_thenReturnImageUrl() {
String instruction = "A humming bird fluttering above a small girl's head";
String imageUrl = openAIImageGenService.generateImage(instruction, 256,
256, "dall-e-2");
logger.info("imageUrl: {}", imageUrl);
}
Unlike the default option, we've used the DALL-E-2 model and reduced the height and width to 256px. However, while setting the image options we should ensure that the underlying Image Model supports them.
Output:
[2025-02-15 22:20:52] [INFO] [c.k.a.i.OpenAiImageGenLiveTest] - imageUrl: https://oaidalleapiprodscus.blob.core.windows.net/private/org-1Z1AExPMO9Z3Xrqf1DnG4hQX/user-HyqKwzvJjhGDhffrYuorr5G7/img-LrEpeHLtXDY29otxg7VTnEjh.png?st=2025-02-15T15%3A50%3A52Z&se=2025-02-15T17%3A50%3A52Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-02-15T06%3A39%3A02Z&ske=2025-02-16T06%3A39%3A02Z&sks=b&skv=2024-08-04&sig=pA8p9FKY3LH8Dza0ydBJI6CljFGkXmD6Z/PYtAO6Pcs%3D
Once again, we'll verify the generated image by accessing it over the browser:
Unlike the default case, the image style and size are completely different.
Conclusion
Spring AI's Image Model API is easy to use and abstracts the complexity of the underlying provider-specific service. Its standard interfaces help users easily switch between the various LLM service providers.
Nonetheless, the library is fairly new, and some features, such as image editing APIs from the foundational model providers, have not yet been integrated.
Visit our GitHub repository to access the article's source code.
Comments
Post a Comment