Spring AI Advisors API

Most enterprise applications have to deal with cross-cutting concerns like governance, compliance, auditing, and security. That’s where the interceptor pattern addresses these challenges in most development frameworks.

Spring AI Advisors specifically benefit AI applications that interact with Large Language Models (LLMs). These advisors act as interceptors for LLM requests and responses. They can intercept outgoing requests to LLM services, modify them, and then forward them to the LLM. Similarly, they can intercept incoming LLM responses, process them, and then deliver them to the application.

The advisors can be useful for numerous use cases:

Sanitize and validate data before sending them to LLMs
Monitor and control LLM usage costs
Auditing and logging LLM requests and responses
adhere to governance and compliance requirements as part of the responsible AI usage

Let's learn more about the topic.

Prerequisites

To demonstrate the capabilities of Advisors API, we'll need OpenAI's LLM service. Therefore, we must Sign up for an OpenAI subscription to acquire API keys and invoke its LLM services.

Additionally, we must import the necessary Maven dependencies in our Spring Boot application by following the Spring Initializr page. Moreover, the details are present in our source code repository.

Further, knowledge of Spring AI configurations for integrating with OpenAI would be helpful. Moreover, we have the configuration file in our source code repository.

Key Concepts and Components

Spring AI's Advisors API helps build reusable components that can intercept the request and response messages from the LLM service. In addition, we can add multiple advisors into a pre-defined order, which will execute a chain of multi-purpose advisors:

Each advisor in the chain:

Can probe the request and the response
Can modify the request and the response
Can choose to block or allow the request and response

Moving on, let's explore the important components of the framework:

All advisors implement CallAroundAdvisor and StreamAroundAdvisor interfaces for synchronous and asynchronous scenarios. Further, SimpleLoggerAdvisor, SafeGuardAdvisor, and MessageChatMemoryAdvisor are some out-of-the-box advisors available in the Spring AI framework.

The ChatClient.ChatClientRequestSpec#advisors() method configures the advisor chain. This is configured at the time when we build the ChatClient object.

The CallAroundAdvisor#aroundCall() and StreamAroundAdvisor#aroundStream() methods intercept the LLM requests and responses. They forward the request by calling either the CallAroundAdvisorChain#nextAroundCall() or the StreamAroundAdvisorChain#nextAroundChain() methods. Nevertheless, in this article, we'll focus on the synchronous advisors.

To share states between the advisors, we can use the context of type Map<String, Object> in the AdvisedRequest object. Moreover, we can throw exceptions in CallAroundAdvisor#aroundCall() to prevent the request from propagating further in the advisor chain.

The next few sections will demonstrate building advisors for real-world problems.

Use Case

Before we start implementing the advisors, let's briefly understand a few pertinent requirements of a chatbot.

A company wants to build a chatbot that answers user queries on company policies. While the idea is great, the company also wants to limit the number of queries users can ask daily to reduce the cost associated with LLM service usage.

Additionally, the company wants to cache the response from the LLM for the user queries. The cached user queries and their responses should be available in the user profile for their reference.

Our focus is on the Spring AI Advisor framework, therefore we'll just implement the outline of the two advisors. The full implementation is beyond the scope of this article.

Implementation

To implement the use case, we'll develop two advisors, LlmUsageLimitAdvisor and CacheLlmResponseAdvisor.

Now, let's examine each advisor's implementation. Later, we'll plug them in and demonstrate their invocation while we call the LLM.

Limit LLM Usage Advisor

We'll develop the class LlmUsageLimitAdvisor by implementing the CallAroundAdvisor interface:

public AdvisedResponse aroundCall(AdvisedRequest advisedRequest,
    CallAroundAdvisorChain chain) {
    LOGGER.info("LlmUsageLimitAdvisor: aroundCall");
    Map<String, Object> context = advisedRequest.adviseContext();
    User user = (User) context.get("user");

    if(isUserUsageWithInLimit(user)) {
        AdvisedResponse response = chain.nextAroundCall(advisedRequest);
        updateUsage(user);
        LOGGER.info("Usage updated for {}", user.getUserName());
        return response;
    } else {
        LOGGER.warn("User usage limit exceeded for user: {}", user.getUserName());
        throw new RuntimeException("User usage limit exceeded");
    }
}

The method retrieves the User object from the context map returned by the AdvisedRequest#adviseContext() method. Later, we call the isUserUsageWithInLimit() method to check the user's usage limit. If the usage is within the limit, the request is forwarded. Once the response is retrieved, we update the user's usage by calling updateUsage(). Further, if the check fails, we raise a runtime exception to stop the method chaining.

In a later section, we'll see how to pass the User object in the context.

Cache LLM Response Advisor

Similarly, let's implement the CacheLlmResponseAdvisor class to cache the user query and LLM responses:

public AdvisedResponse aroundCall(AdvisedRequest advisedRequest,
    CallAroundAdvisorChain chain) {
    LOGGER.info("CacheLlmResponseAdvisor: aroundCall");    
    Map<String, Object> context = advisedRequest.adviseContext();
    User user = (User) context.get("user");

    String userQuery = advisedRequest.userText();

    AdvisedResponse response = chain.nextAroundCall(advisedRequest);
    cacheResponse(user, userQuery, response);
    return response;
}

After retrieving the User object, user query, and later the response from the LLM, we called the cacheResponse() method. The method must have downstream calls to save the objects into the application's cache.

Set the Advisors

Let's now configure the ChatClient to set the advisors with the help of the ChatClient.AdviserSpec interface:

void whenConfigureChatClientWithAdvisors_thenInterceptLLMCallsWithAdvisors() {
        
    ChatClient chatClient = ChatClient.create(chatModel);
       
    User currentUser = new User("7012", "Parthiv Pradhan");

    ChatResponse chatResponse = chatClient.prompt()
        .user("What is a LLM")
        .advisors(advisorSpec -> advisorSpec.advisors(
                List.of(new LlmUsageLimitAdvisor(0), new CacheLlmResponseAdvisor(1)))
            .param("user", currentUser))
        .call()
        .chatResponse();
    LOGGER.info("chatResponse: {}", chatResponse.getResult()
      .getOutput()
      .getContent());
}

The Spring AI framework injects an instance of the ChatModel object in the class. We use this instance of ChatModel interface to create the instance of ChatClient.

The ChatClient.AdviserSpec object in the ChatClientRequestSpec#advisors() method helps specify the advisors defined in the previous sections and their order. Further, the ChatClient.AdviserSpec#param() method helps set the User object in the context map, which we extracted in the advisors implemented earlier.

Finally, the ChatClient.ChatClientResponseSpec#call() method invokes the LLM with the user query and returns the response.

Execute

Let's run the method and examine the invocation of the advisors:

LlmUsageLimitAdvisor] - LlmUsageLimitAdvisor: aroundCall
LlmUsageLimitAdvisor] - Checking user usage limit for user: James
CacheLlmResponseAdvisor] - CacheLlmResponseAdvisor: aroundCall
CacheLlmResponseAdvisor] - Caching the response for user: James
LlmUsageLimitAdvisor] - Usage updated for James
AdvisorUnitTest] - chatResponse: LLM typically stands for "Large Language Model."..

First, LlmUsageLimitAdvisor intercepts the request and then forwards it to CacheLlmResponseAdvisor. Finally, CacheLlmResponseAdvisor forwards the request to the LLM. Subsequently, the response is propagated in the reverse order of the advisors.

Conclusion

In this article, we learned about Spring AI's Advisors API.

We demonstrated steps to create and configure reusable advisors to intercept the LLM requests and responses. Notably, the fluent style APIs in the ChatClient class provide a highly intuitive development experience. Furthermore, we discussed the advisor context object to effectively manage and share the advisor state.

Spring AI also provides several built-in advisor classes, such as PromptChatMemoryAdvisor, QuestionAnswerAdvisor, RetrievalAugmentationAdvisor, and SafeGuardAdvisor, which offer powerful functionalities worth exploring further.

Visit our GitHub repository to access the article's source code.

Kode Sastra

Search this blog