Skip to main content

Function calling API

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

With Anyscale Endpoints, you can use function calling API (introduced by OpenAI) to have your model use external tools. You can define the functions along with their parameters and have the model dynamically choose which function to call and what parameters to pass to it.

Here’s how function calling typically works:

  1. You input a query, specifying tools alongside their parameters and descriptions.
  2. The LLM evaluates whether to activate a tool. If it opts not to, it responds in natural language—either providing an answer based on its internal knowledge or seeking clarifications about the query and tool usage. If it decides to use a tool, it suggests the appropriate API and details on how to employ it, all formatted in JSON.
  3. You then execute the API call in your app and return the response back to the LLM and have it analyze the results and continue with the next steps.

There are two variations of function calling API supported in Anyscale Endpoints:

  • Generic JSON-based function calling API enabled by Anyscale on a select subset of models. This solution is fully based on JSON mode and is more flexible and can be used with any model. However it comes at a cost of higher token count and extra latency as it involves multiple internal API calls to JSON mode models. On the bright side, it allows full API compatibility with OpenAI's function calling since we can fully intercept the intermediate results and cast the output produced by LLMs in the OpenAI compatible data structures such as tool calls, etc.
note

Anyscale's JSON-based function calling API only supports function calling in single calls, not parallel or nested calls.

  • Models that natively support function calling using special prompting (for example mistralai/Mixtral-8x22B-Instruct-v0.1). These models have been fine-tuned to support tool calls using special prompts. This solution is more efficient as it has baked in the function calling support into the LLM itself using a special fine-tuning step. However, it does not guarantee that the outputs will be parsed into JSON. Therefore, the outputs cannot be easily parsed into OpenAI's tool calls format. Also different open source models may have different conventions as to how the LLM generates the tool calls. So it is important to check the documentation for each model to understand how to parse the output. For these variation of function calling, we provide OpenAI compatibility on the Chat API call, however the outputs are still in the raw format as produced by the LLM and needs to be parsed further by the user.

Supported models

Native Function Calling Models

JSON-based Function Calling Models

warning

top_p and top_k sampling parameters are incompatible with function calling.

Automatic tool selection

By setting tool_choice to auto, the LLM automatically selects the tool to use.

Example:

import openai

client = openai.OpenAI(
base_url = "https://api.endpoints.anyscale.com/v1",
api_key = "esecret_YOUR_API_KEY"
)

# Define the messages
messages = [
{"role": "system", "content": "You are helpful assistant."},
{"role": "user", "content": "What's the weather like in San Francisco?"}
]
# Define the tools
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]


response = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
print(response.choices[0].message)

Example output:

Native function calling

This an example from mistralai/Mixtral-8x22B-Instruct-v0.1 model:

{
"role": "assistant",
"content": "[TOOL_CALLS] [{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}}]"
}

JSON-based function calling

{
"role": "assistant",
"content": null,
"tool_calls": [
"id": "call_...",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": '{\n "location": "San Fransisco, USA",\n "format": "celsius"\n}'
}
],
}
note

If you see values different from the ones in the example, try running the code again. The output is non-deterministic.

Forcing a specific tool call or no tool call

By setting tool_choice to none, you can have the LLM not use any tools.

Example:

response = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
messages=messages,
tools=tools,
tool_choice="none", # auto is default, but we'll be explicit
)
print(response.choices[0].message)

Example output:

{
"role": "assistant",
"content": "I don't know how to do that. Can you help me?"
}

You can also force the LLM to respond with a specific tool call by setting tool_choice = {"type": "function", "function": {"name":"get_current_weather"}}. In the case of JSON-based function calling, the LLM always uses the get_current_weather function, however in the case of Native function calling, the LLM is still susceptible to hallucinating the tool call and may not always use the tool call you specified.

Example:

response = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_current_weather"}},
)
print(response.choices[0].message)

Example output:

{
"role": "assistant",
"content": null,
"tool_calls": [
"id": "call_...",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": '{\n "location": "San Fransisco, USA",\n "format": "celsius"\n}'
}
],
}

Finish reason

Native function calling

In this case, the finish reason is the same as the default behavior for chat-completion API.

JSON-based function calling

Finish reason is tool_calls if the LLM uses a tool call. Otherwise it's the default finish reason.

Calling LLM with tool responses

You can feed the results of the tool call back to the LLM and have it continue the conversation based on the results.

Example:

This example is for the JSON-based function calling API. The native function calling API may have a different output format that needs to be parsed into tool name and arguments.

import json

# This is example of expected behavior for json-mode function calling support.
# In case of native-function calling the previous response should be
# parsed to extract the tool name and arguments.
# In our previous weather example we got the following response from the LLM
message = response.choices[0].message
assistant_message = response.choices[0].message
arguments = message.tool_calls[0].function.arguments
tool_id = message.tool_calls[0].id
tool_name = message.tool_calls[0].function.name
# content = get_weather(**json.loads(arguments))
content = str({"temperature": 20, "unit": "celsius"})

# Append the assistant and tool message
messages.append(assistant_message)
messages.append({
"role": "tool", "content": content, "tool_call_id": tool_id, "name": tool_name}
)

# Send the messages back to the LLM
response = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
messages=messages,
tools=tools,
tool_choice="auto",
)

print(response.choices[0].message)

Example output:

{
"role": "assistant",
"content": "It's 20 degrees celsius in San Francisco."
}
tip

For JSON-based function calling, it's important to pass the exact id of the tool call to the LLM. Otherwise, the LLM isn't able to find the tool call and returns an error.

note

Function calling on Anyscale Endpoints requires additional token processing. This leads to a higher generated token count than what's finally outputted by the LLM, specially for JSON-based function calling this can be as twice as high as the equivalent native function calling model.

warning

The LLM sometimes doesn't respond intelligently to the tool call response, either calling the tool again or outputting the citation to the tool_id in the output. Try forcing the model to not use any tools with tool_choice="none" or use better prompting to help with this issue.