Generate videos with FastVideo

This example demonstrates how to deploy the state-of-the-art video generation model as an Anyscale service using Fast Video. View the full code for this example here.

Install the Anyscale CLI

pip install -U anyscale
anyscale login

Deploy the service

Clone the example from GitHub.

git clone https://github.com/anyscale/examples.git
cd examples/video_generation_with_fastvideo

Deploy the service.

anyscale service deploy -f service.yaml

Query the service

The anyscale service deploy command outputs a line that looks like

curl -H "Authorization: Bearer <SERVICE_TOKEN>" <BASE_URL>

Navigate to the service in the services tab of the Anyscale console to watch the progress of the service deployment.

Once the service is deployed, you can view the Gradio UI by pasting the appropriate "base URL" into your browser.

From there, you can generate videos by tweaking the prompt and the number of inference steps.

By default, this example uses L4 GPUs and so generation is quite slow (3 inference steps can take around 90 seconds). On an H100, a 5 second video can be generated in around 5 seconds.

Understanding the example

The first Ray Serve deployment is GenerateVideo, which instantiates the video generation model using FastVideo and runs inference.

The @serve.deployment decorator specifies the accelerator type and the amount of CPU memory required. Without the memory requirement, Anyscale may provision an instance that is too small, and FastVideo will run out of memory.
Switch to an H100 to generate a high quality video in a reasonable period of time.

@serve.deployment(
    num_replicas=1,
    ray_actor_options={
        "num_gpus": 1,
        "memory": 50 * 10**9,
        "accelerator_type": "L4"
    }
)
class GenerateVideo:
    def __init__(self):
        # Create a video generator with a pre-trained model
        self.generator = VideoGenerator.from_pretrained(
            "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
            num_gpus=1,
        )

    def generate(self, prompt: str, num_inference_steps: int = 3) -> bytes:
        # Generate the video.
        video = self.generator.generate_video(
            prompt,
            num_inference_steps=num_inference_steps,
            return_frames=True,
        )

        buffer = io.BytesIO()
        imageio.mimsave(buffer, video, fps=16, format="mp4")
        buffer.seek(0)
        video_base64 = base64.b64encode(buffer.getvalue()).decode("utf-8")

        return video_base64

    async def __call__(self, http_request: Request) -> bytes:
        data = await http_request.json()
        return self.generate(data["prompt"], data["num_inference_steps"])

Next, GradioServer wraps a Gradio UI in a Ray Serve deployment. It takes in generator, which is a handle to the Ray Serve deployment. The logic for actually building the UI is hidden in gradio_builder.

@serve.deployment
class GradioServer(ASGIAppReplicaWrapper):
    """User-facing class that wraps a Gradio App in a Serve Deployment."""

    def __init__(self, generator: serve.handle.DeploymentHandle):
        self.generator = generator
        ui = gradio_builder(generator)
        super().__init__(gr.routes.App.create_app(ui))

The two deployments are combined together to produce the overall application in the line

app = GradioServer.bind(GenerateVideo.bind())

which passes a handle to the GenerateVideo deployment into the GradioServer deployment.

Install the Anyscale CLI​

Deploy the service​

Query the service​

Understanding the example​

Install the Anyscale CLI

Deploy the service

Query the service

Understanding the example