Skip to main content

Build AI agents on Anyscale

Build AI agents on Anyscale

This page provides an overview of building AI agents on Anyscale, including the recommended decoupled architecture and how to choose between single-agent and multi-agent patterns.

What is an AI agent?

An AI agent is an application that uses an LLM as a reasoning engine to plan and execute multi-step tasks by calling tools and interacting with external systems. Unlike a standalone chatbot, an agent decides which tool to call, when to call it, and how to combine results into a final answer.

A typical agent loop does the following:

  1. Receives a user request.
  2. Calls an LLM to decide whether to use a tool, ask a follow-up, or respond directly.
  3. Executes any tool calls such as a database query, web search, or API request, then feeds results back to the LLM.
  4. Repeats until the LLM produces a final answer.

Agents extend this pattern in two important directions. Single agents call tools that you expose through the Model Context Protocol (MCP). Multi-agent systems split work across multiple specialized agents, where each agent owns its own tools, prompts, and scaling profile. The reference multi-agent architecture on Anyscale uses the Agent-to-Agent (A2A) protocol for coordination (see A2A protocol documentation), but A2A is one option among many.

Quickstart

You can run templates for both patterns directly from the Anyscale console:

How does Anyscale support agents?

Anyscale runs each component of an agent as an independent Ray Serve application backed by an Anyscale service. The recommended decoupled microservices architecture splits responsibilities across the following layers:

Together, these components keep model inference, tool execution, and orchestration independently deployable.

Single-agent versus multi-agent patterns

The right architecture depends on how much your workload benefits from specialization. The following table compares the two patterns:

PatternWhen to useExample
Single tool-using agentYour agent solves a focused problem and the tools it needs share a single domain. One LLM, one set of tools, one prompt.A weather assistant that calls weather APIs, or a support agent that queries a single ticket database.
Multi-agent systemYour workload spans multiple domains, each with its own tools, prompt strategy, and quality bar. You want to compose specialized agents instead of overloading one monolithic agent.A travel planner that delegates research to a web-search agent and forecasts to a weather agent, then synthesizes a final itinerary.

Start with a single agent. Move to a multi-agent system when prompts grow unwieldy, when one agent's tools start crowding out another's, or when you need to scale or version pieces of the workflow independently. The following diagram shows the reference multi-agent pattern.

Multi-agent architecture showing a client calling an orchestrator agent that coordinates specialized agents, MCP tool services, and an LLM service

Why decouple agent services?

Running agents on Anyscale means each layer scales and fails independently. The LLM service is GPU-bound and expensive, so it scales with inference demand. The lightweight agent and tool services are CPU-bound, so they scale with request volume. If a tool service fails, the agent can handle the partial failure without taking down the whole system.

Anyscale services also harden the deployment for real traffic. They support zero-downtime rolling updates, multi-version deployments for A/B testing and canaries, distributed tracing across agent, tool, and LLM calls, and per-replica logs that make agent reasoning loops debuggable. See Update an Anyscale service, Deploy multiple versions of an Anyscale service, and Monitor a service.

The following pages cover the components that agents on Anyscale build on: