Building an AI Chat Agent with MCP, Spring AI

Model Context Protocol (MCP) is an open standard for connecting AI apps to tools and data sources. A useful way to think about it is as a USB-C port for AI: one standard interface that lets different models plug into different capabilities without custom glue code for every integration.

In this project, we combine MCP, Spring AI, and Google Gemini to build a chat app that can answer weather questions using real tools instead of hallucinating. The system has three parts:

MCP tool server - a Spring Boot service that exposes weather and geocoding tools
AI chat agent - a Spring Boot service that uses Spring AI + Gemini and calls MCP tools when needed
React chat UI - a lightweight frontend for sending messages and rendering replies

The result is a small but realistic architecture you can extend into a production assistant.

Architecture

User (Browser:3000)
    | POST /api/chat
    v
AI Agent (Spring:7171) -- MCP / Streamable HTTP --> MCP Server (Spring:7170)
    |                                               |
    | Google Gemini                                 | Bright Sky API (weather)
    |                                               | OpenStreetMap Nominatim (geocoding)
    v                                               v
Chat response                                    Tool execution

The full source code is available on GitHub.

1. The MCP Tool Server

The tool server is a Spring Boot application that exposes MCP tools through Spring AI's annotation scanner. It runs on port 7170 and uses Streamable HTTP for transport.

Dependencies

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

Defining tools

With Spring AI, a tool is just a Spring bean method annotated with @McpTool:

@Component
public class WeatherTool {

    private final WeatherToolService weatherToolService;

    public WeatherTool(WeatherToolService weatherToolService) {
        this.weatherToolService = weatherToolService;
    }

    @McpTool(name = "get_current_weather",
             description = "Get current weather by dwd_station_id or by lat/lon")
    public Map<String, Object> getCurrentWeather(
            @McpToolParam(description = "DWD station ID", required = false)
            String dwd_station_id,
            @McpToolParam(description = "Latitude", required = false) Double lat,
            @McpToolParam(description = "Longitude", required = false) Double lon
    ) {
        return weatherToolService.getWeather(dwd_station_id, lat, lon);
    }
}

Spring turns that method into an MCP tool definition and publishes the parameter metadata as part of the schema. That means the model can discover the tool, understand its inputs, and decide when to call it.

The project also includes a geocoding tool that resolves city names to coordinates:

@McpTool(name = "geocode_city",
         description = "Convert a city name to latitude and longitude using OpenStreetMap Nominatim")
public Map<String, Object> geocodeCity(
        @McpToolParam(description = "City name (e.g., 'Berlin', 'New York')", required = true)
        String cityName
) { ... }

The service layer

The tools delegate the real work to services that handle validation, caching, and external API calls:

@Service
public class WeatherToolService {

    public Map<String, Object> getWeather(String dwdStationId, Double lat, Double lon) {
        // Validate the request
        // Check the cache
        // Call Bright Sky if needed
        // Return a structured response
    }
}

The key design choices are straightforward:

Separate TTL caches for station-id and coordinate lookups
Structured responses with success, error_code, and error_message
Cache metadata in each response so you can see whether the result came from cache or upstream

Server configuration

server:
  port: 7170

spring:
  ai:
    mcp:
      server:
        name: spring-sample-mcp-server
        version: 1.0.0
        protocol: STREAMABLE
        type: SYNC
        annotation-scanner:
          enabled: true

mcp:
  security:
    api-key: ${MCP_API_KEY:}

The STREAMABLE protocol gives the agent a lightweight MCP transport, and the shared API key keeps the demo simple without adding full auth infrastructure.

2. Security for the Demo

The MCP server and agent share an MCP_API_KEY. The agent adds it automatically as an X-API-Key header, and the server validates it on inbound MCP requests.

That is enough for local development and a sample project. For anything public-facing, move to Spring Security, OAuth2 or JWT, rate limiting, and a gateway in front of the MCP endpoint.

3. The AI Chat Agent

The agent is responsible for deciding when to use tools, calling Gemini, and keeping the conversation stateful.

Dependencies

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-google-genai</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

MCP client configuration

The agent injects the shared API key through a custom HTTP request customizer:

@Configuration
public class AgentConfiguration {

    @Bean
    McpClientCustomizer<HttpClientStreamableHttpTransport.Builder>
    streamableHttpTransportCustomizer(AgentProperties properties) {
        McpSyncHttpClientRequestCustomizer requestCustomizer =
                (builder, method, uri, body, context) -> {
                    if (StringUtils.hasText(properties.getMcpApiKey())) {
                        builder.header("X-API-Key", properties.getMcpApiKey());
                    }
                };
        return (name, builder) -> builder.httpRequestCustomizer(requestCustomizer);
    }
}

Core chat flow

The agent keeps a small in-memory conversation history, checks whether the user message looks like a tool request, and then routes the prompt through either a plain Gemini client or a tool-enabled client.

public String reply(String sessionId, String userMessage) {
    List<ConversationTurn> history = memoryStore.history(sessionId);
    String prompt = buildPrompt(history, userMessage);
    boolean toolRequest = shouldUseTools(userMessage);
    ChatClient client = toolRequest ? toolEnabledClient() : plainChatClient;
    String answer = invokeModel(client, prompt);
    memoryStore.appendTurn(sessionId, userMessage, answer);
    return answer;
}

The lazy initialization is deliberate: the agent can start even if the MCP server is down, and it only initializes MCP clients when a tool request actually arrives.

The tool trigger is intentionally simple:

private static boolean shouldUseTools(String userMessage) {
    String normalized = userMessage.toLowerCase(Locale.ROOT);
    for (String keyword : TOOL_KEYWORDS) {
        if (normalized.contains(keyword)) {
            return true;
        }
    }
    return false;
}

That heuristic is enough for a demo and easy to explain. In a larger system, you could replace it with a router model or intent classifier.

Virtual threads and timeout handling

The model call runs on a virtual thread with a configurable timeout so the request does not hang forever if Gemini is slow or unreachable:

private String invokeModel(ChatClient client, String prompt) {
    var executor = Executors.newVirtualThreadPerTaskExecutor();
    try {
        var future = executor.submit(() ->
                client.prompt().user(prompt).call().content());
        return future.get(timeoutSeconds, TimeUnit.SECONDS);
    } catch (TimeoutException ex) {
        throw new ResponseStatusException(HttpStatus.GATEWAY_TIMEOUT, ...);
    } finally {
        executor.shutdownNow();
    }
}

Session memory

Conversation history lives in an in-memory LRU store with a small per-session turn window. That keeps follow-up questions like "What about tomorrow?" grounded in the earlier exchange without introducing a database too early.

The agent configuration sets the model to gemini-3.5-flash, the memory limit to 20 turns per session, and the session cap to 500.

4. The React Chat UI

The frontend is a Vite app with a simple chat window, minimal state, and no component library.

const [messages, setMessages] = useState([]);
const [loading, setLoading] = useState(false);

const sendMessage = async (text) => {
    setMessages(prev => [...prev, { role: 'user', content: text }]);
    setLoading(true);
    const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ sessionId, message: text })
    });
    const data = await response.json();
    setMessages(prev => [...prev, {
        role: 'assistant',
        content: data.reply || 'No response'
    }]);
    setLoading(false);
};

The Vite dev server proxies /api/* to the agent:

proxy: {
  '/api': {
    target: 'http://localhost:7171',
    changeOrigin: true,
    rewrite: (path) => path.replace(/^\/api/, '')
  }
}

The UI is intentionally plain: a purple gradient, responsive layout, and a smooth message list are enough to make the app feel complete without distracting from the architecture.

5. Putting It All Together

Running the application

Set the environment variables:

export GEMINI_API_KEY=your_gemini_api_key
export MCP_API_KEY=a_shared_secret

Start the MCP server:

cd mcp-server-spring
mvn spring-boot:run

Start the agent:

cd mcp-spring-agent
mvn spring-boot:run

Start the UI:

cd mcp-ui
npm install
npm run dev

What happens when you ask a question

If the user asks, "What's the weather in Berlin?" the flow looks like this:

The agent sees the word "weather" and switches to tool-enabled mode
Gemini calls geocode_city("Berlin") to get coordinates
The agent calls get_current_weather(lat=52.52, lon=13.41)
Gemini turns the raw data into a readable response
The UI renders the answer

6. Why This Architecture Works

MCP separates the model from the tools. The agent knows what tools exist and how to call them, but not how those tools are implemented. That makes the system easier to evolve.

The same server can serve different models. Gemini is just the model in this demo. The MCP server itself can work with any compatible client.

Lazy initialization keeps the app resilient. The agent can boot even if the MCP server is temporarily unavailable, and tool support only activates when it is actually needed.

7. What's Next

This sample is a solid starting point. Natural next steps include:

Docker Compose - run all services together
PostgreSQL persistence - durable chat history and richer memory
OAuth2 - authenticated multi-user access
WebSocket streaming - token-by-token responses
Kubernetes - scale the agent and tool server independently

Resources

Have you built anything with MCP and Spring AI? I'd love to hear how you approached it.

Building an AI Chat Agent with MCP, Spring AI

Architecture

1. The MCP Tool Server

Dependencies

Defining tools

The service layer

Server configuration

2. Security for the Demo

3. The AI Chat Agent

Dependencies

MCP client configuration

Core chat flow

Virtual threads and timeout handling

Session memory

4. The React Chat UI

5. Putting It All Together

Running the application

What happens when you ask a question

6. Why This Architecture Works

7. What's Next

Resources

Tags

Author

Stats

Published

You Might Also Like

The Principle of Least AI

. .. . ... . .... . .... . ... .

I'm not a developer, but I built a calendar app to fix my most annoying work task

The 80/20 Rule of AI Code — Why the Last 20% Takes 80% of Your Time

Too cheap to be good? Think again.

Internmaxxing vs. Old Man Shakes Fist at Cloud