Skip to main content

Portkey integration

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

Portkey helps bring Anyscale APIs to production with its abstractions for observability, fallbacks, caching, and more. Use the Anyscale API through Portkey for:

  1. Enhanced Logging: Track API usage with detailed insights.
  2. Production Reliability: Automated fallbacks, load balancing, and caching.
  3. Continuous Improvement: Collect and apply user feedback.
  4. Enhanced Fine-Tuning: Combine logs and user feedback for targeted fine-tuning.

1.1 Setup and logging

  1. Set $ export ANYSCALE_API_KEY=<YOUR_ANYSCALE_ENDPOINT_API_KEY>
  2. Obtain your Portkey API Key.
  3. Switch to Portkey Gateway URL: https://api.portkey.ai/v1/proxy

See full logs of requests, like latency, cost, and tokens, and dig deeper into the data with their analytics suite.

""" OPENAI PYTHON SDK """
import openai

PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1/proxy"

PORTKEY_HEADERS = {
'Authorization': 'Bearer ANYSCALE_KEY',
'Content-Type': 'application/json',
# **************************************
'x-portkey-api-key': 'PORTKEY_API_KEY', # Get from https://app.portkey.ai/.
'x-portkey-mode': 'proxy anyscale' # Tell Portkey that the request is for Anyscale.
# **************************************
}

client = openai.OpenAI(base_url=PORTKEY_GATEWAY_URL, default_headers=PORTKEY_HEADERS)

response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[{"role": "user", "content": "Say this is a test"}]
)

print(response.choices[0].message.content)

1.2 Enhanced observability

  • Trace requests with a single ID.
  • Append custom tags for request segmenting and in-depth analysis.

Add their relevant headers to your request:

""" OPENAI PYTHON SDK """
import json

PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1/proxy"

TRACE_ID = 'anyscale_portkey_test'

METADATA = {
"_environment": "production",
"_user": "userid123",
"_organisation": "orgid123",
"_prompt": "summarisationPrompt"
}

PORTKEY_HEADERS = {
'Authorization': 'Bearer ANYSCALE_KEY',
'Content-Type': 'application/json',
'x-portkey-api-key': 'PORTKEY_API_KEY',
'x-portkey-mode': 'proxy anyscale',
# **************************************
'x-portkey-trace-id': TRACE_ID, # Send the trace ID.
'x-portkey-metadata': json.dumps(METADATA) # Send the metadata.
# **************************************
}

client = openai.OpenAI(base_url=PORTKEY_GATEWAY_URL, default_headers=PORTKEY_HEADERS)

response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[{"role": "user", "content": "Say this is a test"}]
)

print(response.choices[0].message.content)

Here’s how the logs appear on your Portkey dashboard:

2. Caching, fallbacks, and load balancing

  • Fallbacks: Ensure your app remains functional even if a primary service fails.
  • Load Balancing: Efficiently distribute incoming requests among multiple models.
  • Semantic Caching: Reduce costs and latency by intelligently caching results.

Toggle these features by saving Configs from the Portkey dashboard > Configs tab.

To enable semantic caching and fallback from Llama2 to Mistral, your Portkey config would look like the following:

{
"cache": "semantic",
"mode": "fallback",
"options": [
{
"provider": "anyscale",
"api_key": "...",
"override_params": { "model": "meta-llama/Llama-2-7b-chat-hf" }
},
{
"provider": "anyscale",
"api_key": "...",
"override_params": { "model": "mistralai/Mistral-7B-Instruct-v0.1" }
}
]
}

Next, send the Config key with x-portkey-config header:

""" OPENAI PYTHON SDK """
PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1/proxy"

PORTKEY_HEADERS = {
'Content-Type': 'application/json',
'x-portkey-api-key': 'PORTKEY_API_KEY',
'x-portkey-mode': 'proxy anyscale',
# **************************************
'x-portkey-config': 'CONFIG_KEY'
# **************************************
}

client = openai.OpenAI(base_url=PORTKEY_GATEWAY_URL, default_headers=PORTKEY_HEADERS)

response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[{"role": "user", "content": "Say this is a test"}]
)

print(response.choices[0].message.content)

For more on Configs and other gateway features like load balancing, see the Portkey docs.

3. Collect feedback

Gather weighted feedback from users and improve your app:

""" REQUESTS LIBRARY """
import requests
import json

PORTKEY_FEEDBACK_URL = "https://api.portkey.ai/v1/feedback" # Portkey Feedback Endpoint.

PORTKEY_HEADERS = {
"x-portkey-api-key": "PORTKEY_API_KEY",
"Content-Type": "application/json",
}

DATA = {
"trace_id": "anyscale_portkey_test", # On Portkey, you can append feedback to a particular Trace ID.
"value": 1,
"weight": 0.5
}

response = requests.post(PORTKEY_FEEDBACK_URL, headers=PORTKEY_HEADERS, data=json.dumps(DATA))

print(response.text)

4. Continuous fine tuning

Once you start logging your requests and their feedback with Portkey, it becomes very easy to 1️) Curate & create data for fine-tuning, 2) Schedule fine-tuning jobs, and 3) Use the fine-tuned models.

Fine tuning is enabled for select organizations. Request access on Portkey Discord.

header

Conclusion

Integrating Portkey with Anyscale helps you build resilient LLM apps. With features like semantic caching, observability, load balancing, feedback, and fallbacks, you can ensure optimal performance and continuous improvement.

Read full Portkey docs here. | Reach out to the Portkey team.