Skip to main content

Tool calling

RayLLM offers tool calling support. Tool calling is a feature that allows a user to provide an LLM with a prompt and a set of functions, their parameters, and descriptions of their behavior. The LLM can ask the user to call one of the functions and incorporate the output into its response.

How does tool calling work?

Tool calling typically follows three steps:

  1. The user submits a query and includes a list of functions along with their parameters and descriptions.
  2. The LLM evaluates whether to activate a function:
    • If it decides not to, it responds in natural language by either providing an answer based on its internal knowledge or seeking clarifications about the query and tool usage.
    • If it decides to use a function, it suggests the appropriate API and details on how to employ it, all formatted in JSON.
  3. The user then executes the API call in their app and then submits the response back to the LLM. The LLM analyzes the results and continues with any next steps.

Enable tool calling with one of the two following approaches:

Fine-tuning: You can fine-tune an LLM to use tools when prompted in a specific way. Many recently released open-weight models have gone through some stages of post-training and have out-of-the-box capability to use tools. For example, mistralai/Mixtral-8x22B-Instruct-v0.1 and meta-llama/Meta-Llama-3.1-8B-Instruct natively support tool calling. They have been fine-tuned to do so when given a special prompt format. To use these models, you must specify the tool-compatible prompt format in your LLM config YAML.

For example, for mistralai/Mixtral-8x22B-Instruct-v0.1, here is the prompt format to enable tool calling:

prompt_format:
system: "{instruction}\n\n "
assistant: "{tool_calls}{instruction} </s> "
# Special part of assistant message which shows the previous assistant message that was a tool call.
tool_calls: " [TOOL_CALLS] {instruction}"
# Special new role that should trigger the model to ingest results of tool calls.
tool: "[TOOL_RESULTS] {instruction} [/TOOL_RESULTS]"
# The format of the available tools that for mixtral goes into the last user message.
tools_list: "[AVAILABLE_TOOLS] {instruction} [/AVAILABLE_TOOLS] "
trailing_assistant: ""
user: "{tools_list}[INST] {system}{instruction} [/INST]"
# Add only one BOS at the beginning of the entire conversation.
bos: "<s> "
system_in_user: true
# Similar to system_in_user, if true it preprends available tools to the user message.
tools_list_in_user: true
# If true it only prepends this message to the last user.
system_in_last_user: true
# If true it only prepends this message to the last user.
tools_list_in_last_user: true
add_system_tags_even_if_message_is_empty: false
strip_whitespace: true
default_system_message: "Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity."

With this prompt format, RayLLM can provide OpenAI API-compatible tool support for this model. Generate the full config with the RayLLM CLI (rayllm gen-config).

JSON mode: Not all models come with tool calling capabilities out of the box. In this case, use JSON mode to enable tool calling. JSON mode makes the LLM's output structured and predictable.

Server-side tool calling

When you enable JSON mode, RayLLM can process tool-calling requests out of the box. RayLLM accepts tool-calling requests in OpenAI format. Below is an example query:

import os
import json
import requests

s = requests.Session()

TOOL_1 = {
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
"required": ["location", "format"],
},
},
"id": "abc"
}

api_base = "http://localhost:8000/v1"
url = f"{api_base}/chat/completions"
body = {
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the weather like today?"},
],
"temperature": 0,
"stream": True,
"tools": [TOOL_1],
}

with s.post(url, headers={"Authorization": f"Bearer {token}"}, json=body) as resp:
print(resp.json())

The model can choose to directly respond to the question "What is the weather like today?", or it can return a tool with a set of arguments. Then, your app can call the tool and resubmit a request with the tool results:

...
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the weather like today?"},
{"role": "assistant", "tool_calls": [TOOL_1]},
{"role": "tool", "content": "It's quite sunny in Pleasanton.", "tool_call_id": TOOL_1["id"]},
]
...

The follow-up prompt contains two new messages: an assistant message and a tool message. The assistant message contains the tool call requested by the model. The tool message contains the output of the tool call. The model can now incorporate the tool's response into the final output.

Client-side tool calling

See this end-to-end example on the Deploy LLM template to build your own tool calling client.