Skip to main content
Version: Latest

Chat: google/gemma-7b-it

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.


See the Hugging Face model page for more model details.

About this model‚Äč

Model name to use in API calls:


Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They're text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Model Developers: Google

Variations: this model corresponds to the 7B instruct version of the Gemma model. There are also variations of the model that are the 2B base model, 7B base model, and 2B instruct model.

Input Models: input text only.

Output Models: generate text only.

Context Length: 8192

License: a custom commercial license is available at:

Get started‚Äč

  1. Register an account. (If you are viewing this from Anyscale Endpoints, you can skip this step.)
  2. Generate an API key.
  3. Run your first query:

See Query a model for more model query details.

import openai

query = "Write a program to load data from S3 with Ray and train using PyTorch."

client = openai.OpenAI(
base_url = "",
api_key = "esecret_ANYSCALE_API_KEY"
# Note: not all arguments are currently supported and will be ignored by the backend.
chat_completion =
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": query}],
for message in chat_completion:
print(message.choices[0].delta.content, end="", flush=True)