Configure tool and function calling for LLMs

Learn how to deploy models with tool and function calling patterns to enable your LLM to interact with external tools, databases, and APIs.

note

To use full tool and function calling support, install ray>=2.49.0.

Understand tool and function calling

Tool calling allows your LLM to interact with external tools or functions, creating more modular applications.

Use tool and function calling when your application needs integrations with downstream tools, databases, APIs, or any other backend logic.

Deploy a model with tool calling enabled

To use tool calling, update your Ray Serve LLM deployment configuration with a few extra settings.

note

Ray Serve LLM uses vLLM as its backend engine. You can forward tool-related settings through the engine_kwargs field in your Serve config.

To enable tool calling:

Set enable_auto_tool_choice: true
Specify the correct tool_call_parser for your model

See the full list, or how to register your custom parser, in the vLLM documentation.

Create a file serve_my_llama.yaml with the following configuration:

applications:
- name: my-tool-calling-app
  ...
  args:
    llm_configs:
      - ...
        engine_kwargs:
          ...
          enable_auto_tool_choice: true
          tool_call_parser: <YOUR-TOOL-PARSER-ID>

Make a tool call request

To make a tool call request, you must first define your tool schemas, specify the strategy the LLM should follow, and include the tools in the request payload.

Define tools

Define your tools using standard JSON schema format. Each tool includes a name, description, and a list of expected parameters with their types and constraints.

This example defines two tools with signatures:

get_current_temperature(location: string, unit: string = "celsius")
get_temperature_date(location: string, date: string, unit: string = "celsius")

# Tools schema definitions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The location to get the temperature for, in the format \"City, State, Country\"."
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit to return the temperature in. Defaults to \"celsius\"."
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The location to get the temperature for, in the format \"City, State, Country\"."
                    },
                    "date": {
                        "type": "string",
                        "description": "The date to get the temperature for, in the format \"Year-Month-Day\"."
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit to return the temperature in. Defaults to \"celsius\"."
                    }
                },
                "required": ["location", "date"]
            }
        }
    }
]

Configure tool calling strategy

The tool calling strategy controls how the model decides to use tools. Set the tool_choice parameter in your client call to the model.

You can let the model decide automatically, force it to call one or multiple tools, disable tool usage entirely, or specify a single tool to use.

`tool_choice` value	Description
`"auto"` (default)	The model decides whether to use a tool based on the prompt and available tools. Returns an empty tool call list if no tool is used.
`"none"`	Tool calling is disabled. The model responds with plain text, even if tools are defined. However, tool descriptions already present in the system or user prompt may still influence the model's behavior.
`"required"`	Forces the model to call at least one tool, even if it may not be appropriate. Use only when the tool definitions cover all relevant use cases; otherwise, prefer `"auto"`.
`{"type": "function", "function": {"name": "your_tool"}}`	Forces the model to call a specific tool by name. Useful for single-tool workflows or when chaining calls deterministically.

Query the model

Once you decide on your tools and tool calling strategy, send a request to the model by including both the tools and tool_choice parameters.

client = OpenAI(...)

messages = [
    {
        "role": "system",
        "content": "You are a weather assistant. Use the given functions to get weather data and provide the results."
    },
    {
        "role": "user",
        "content": "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2025-07-29."
    }
]
response = client.chat.completions.create(
    ...
    messages=messages,
    tools=tools,
    tool_choice= "auto" # tool calling strategy: Let the model decide
)

print(response.choices[0].message)

You should get a response similar to:

ChatCompletionMessage(
    content=None, 
    refusal=None, 
    role='assistant',
    annotations=None, 
    audio=None, 
    function_call=None, 
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='chatcmpl-tool-3c89da30948f4760b54d457691f92208', 
            function=Function(
                arguments='{"location": "San Francisco, California, USA", "unit": "celsius"}', 
                name='get_current_temperature'
            ), 
            type='function'
        ), 
        ChatCompletionMessageToolCall(
            id='chatcmpl-tool-6ee9721ff2a04426881493c31a36d9f1', 
            function=Function(
                arguments='{"location": "San Francisco, California, USA", "date": "2025-07-30", "unit": "celsius"}', 
                name='get_temperature_date'
            ), 
            type='function'
        )
    ], 
    reasoning_content=None
)

You can access the tool calls in the tool_calls field of your model's response.

Handle tool responses

When the model wants to use a tool, it returns a tool_calls field containing the tool name and its arguments.

Route the call to your own function or APIs. For example:

import json

for tool_call in response.choices[0].message.tool_calls:
    if tool_call.function.name == "get_current_temperature":
        curr_temperature = get_current_temperature(**json.loads(tool_call.function.arguments))
        # Handle the result
        ...
    elif tool_call.function.name == "get_temperature_date":
        temperature = get_temperature_date(**json.loads(tool_call.function.arguments))
        # Handle the result
        ...

Send tool results back to the model

If you choose to return tool results to the model for further generation, you must include both the assistant message that triggered the tool calls and the corresponding tool messages containing the results. The model uses the tool_call_id field in each tool message to match the output to its corresponding tool call.

Add your tool responses as new messages with the role "tool":

def call_tool(tool_call):
    # Handle your tool_call here
    ...

# `response` is your model's last response containing the tool calls it requests.
# Add the previous response containing the tool calls
messages.append(response.choices[0].message.model_dump())

# Loop through the tool calls and create `tool` messages
for tool_call in response.choices[0].message.tool_calls:
    # 1. Call your tool
    output = call_tool(tool_call) 

    # 2. Create a new message of role `"tool"` containing the output of your tool
    messages.append({
        "role": "tool",
        "content": output,
        "tool_call_id": tool_call.id # Refer to the `tool_call_id` in the model's response containing the tool calls
    })

print(messages)

Your messages history should look like:

[
  # Your initial request to the model
  {
    'role': 'system', 'content': 'You are a weather assistant. Use the given functions to get weather data and provide the results.'
  },
  {
    'role': 'user', 'content': "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2025-07-29."
  },
  # Your model response with the requested tool calls
  {'content': None, 'role': 'assistant', 'tool_calls': [
            {'id': 'chatcmpl-tool-be5e993a5bb64701992f13c51f9e7e9a', 'function': {'arguments': '{"location": "San Francisco, California, USA", "unit": "celsius"}', 'name': 'get_current_temperature'}, 'type': 'function'}, 
            {'id': 'chatcmpl-tool-d20a696b01e14994b036e0292cee56de', 'function': {'arguments': '{"location": "San Francisco, California, USA", "date": "2025-07-30", "unit": "celsius"}', 'name': 'get_temperature_date'}, 'type': 'function'}
  ]},
  # You send back the outputs of your tools to the model.
  {
    'role': 'tool', 'content': '{"temperature": 16, "location": "San Francisco, California, USA", "unit": "celsius"}'
  },
  {
    'role': 'tool', 'content': '{"temperature": 26, "location": "San Francisco, California, USA", "date": "2025-07-30", "unit": "celsius"}'
  }
]

Now you can request a new response from the model:

response = client.chat.completions.create(
    ...,
    messages=messages
)

print(response.choices[0].message.content)

Your model should respond with something similar to:

The current temperature in San Francisco is 16 degrees Celsius. 
Tomorrow, on 2025-07-30, the temperature in San Francisco will be 26 degrees Celsius.

Use parallel tool calling

Parallel tool calls allow the model to request and execute multiple tools in a single response. Iterate over each tool call returned by the model and append a corresponding tool message with its output.

Make sure to include the correct tool_call_id for each tool message to ensure the model can match responses to the original tool calls.

For implementation details, see Send tool results back to the model.

note

Not all models support parallel tool calls. See the documentation for your model. For example, Llama-3 models don't support parallel tool calls, while Llama-4 models do.

Configure tool calling with reasoning models

When a model decides to call a tool in its response, the content field is usually empty, and tool details appear in the tool_calls field. However, reasoning models still include their thought process in the response.

Reasoning content is usually returned in the reasoning_content field. To ensure the model parses this correctly, set the appropriate reasoning_parser in your Serve config during deployment. See the vLLM docs for a list of supported reasoning parsers.

applications:
- ...
  args:
    llm_configs:
        ...
        engine_kwargs:
          ...
          reasoning_parser: <REASONING-PARSER> # See vLLM supported reasoning parsers

The response includes an empty content, the model's reasoning in reasoning_content, and the tool calls in tool_calls:

# Response from the model "What is the weather in ..."
ChatCompletionMessage(
    # Empty content
    content='\n\n', 
    ...
    tool_calls=[
        # List of tool calls here
    ], 
    # Reasoning contained here
    reasoning_content='\nOkay, the user is asking for ...'
)

If a reasoning parser isn't configured, the reasoning may fall back into the regular content field.

Configure tool calling with hybrid models

Models such as Qwen-3 are hybrid and can toggle reasoning. To control reasoning in hybrid models, see their documentation.

For example, to control reasoning in Qwen-3, you can:

Add "/think" or "/no_think" in the prompt
Or set enable_thinking in the request: extra_body={"chat_template_kwargs": {"enable_thinking": ...}}

Call a hybrid model such as Qwen-3 with tools and thinking mode enabled:

response = client.chat.completions.create(
    model="my-qwen3",
    messages=messages,
    tools=tools,
    tool_choice= tool_choice,
    extra_body={"chat_template_kwargs": {"enable_thinking": True}} # <-- specific to Qwen-3
)

In thinking mode, follow the same recommendations outlined in Configure tool calling with reasoning models.

Apply best practices

Follow these suggestions for more reliable and predictable behavior when using tool calling:

Validate tool inputs

Validate and sanitize tool arguments before execution. Models can occasionally produce missing or invalid fields. Use frameworks such as Pydantic for reliable validation.

Control tool choice

Use "tool_choice": "auto" to let the model decide when to call a tool, or specify a particular tool by name if you want to force a specific tool.

Use descriptive tool names

Use clear and specific names, descriptions, and arguments to help the model understand their purpose and improve tool selection accuracy.

Set a low temperature

Set a lower temperature to reduce randomness and ensure the model consistently outputs valid tool calls when appropriate.

Choose a model optimized for tool calling

Some models, especially reasoning models, are better at deciding when and how to call tools. Check the documentation of your model family to see if a specialized or fine-tuned version exists.

Summary

In this guide, you learned how to enable tool calling with Ray Serve LLM, define tools, handle model responses, and work with reasoning and hybrid models. You also learned best practices to improve your tool calling workflow.

Understand tool and function calling​

Deploy a model with tool calling enabled​

Make a tool call request​

Define tools​

Configure tool calling strategy​

Query the model​

Handle tool responses​

Send tool results back to the model​

Use parallel tool calling​

Configure tool calling with reasoning models​

Configure tool calling with hybrid models​

Apply best practices​

Validate tool inputs​

Control tool choice​

Use descriptive tool names​

Set a low temperature​

Choose a model optimized for tool calling​

Summary​

Understand tool and function calling

Deploy a model with tool calling enabled

Make a tool call request

Define tools

Configure tool calling strategy

Query the model

Handle tool responses

Send tool results back to the model

Use parallel tool calling

Configure tool calling with reasoning models

Configure tool calling with hybrid models

Apply best practices

Validate tool inputs

Control tool choice

Use descriptive tool names

Set a low temperature

Choose a model optimized for tool calling

Summary