What are Anyscale jobs?
What are Anyscale jobs?
Anyscale jobs run offline workloads in production with automatic retries, resource management, and comprehensive monitoring. Use jobs for batch processing tasks such as model training, batch inference, and data processing pipelines.
For continuously running applications such as model serving endpoints, see What are Anyscale services?.
Common use cases
The following are examples of workloads suited for Anyscale jobs:
| Use case | Description |
|---|---|
| Batch inference | Process large datasets through ML models for predictions at scale. |
| Model training | Distribute training workloads across multiple GPUs or nodes. |
| Model fine-tuning | Fine-tune large language models on custom datasets. |
| Data processing | Transform, clean, and prepare large datasets using Ray Data. |
| Hyperparameter tuning | Run parallel experiments to find optimal model configurations. |
| ETL pipelines | Extract, transform, and load data between systems. |
| Recurring workloads | Schedule periodic data updates, model retraining, or report generation. |
Key features
Anyscale jobs provide the following features for production batch workloads:
| Feature | Description |
|---|---|
| Automatic retries | Configure retry policies with max_retries to automatically restart failed jobs. Jobs restart from the beginning with the same configuration. |
| Job queues | Run multiple jobs on the same cluster to reduce startup times. Supports FIFO, LIFO, and priority-based scheduling. See Use job queues to share clusters. |
| Job schedules | Schedule recurring workloads using cron expressions with timezone support. Automatically run jobs at specified intervals. See Job schedules. |
| Cluster management | Anyscale provisions clusters automatically when jobs start and terminates them when jobs complete. Configure custom compute resources or use existing clusters. |
| Comprehensive monitoring | Access job metrics, logs, Ray Dashboard, and custom Grafana dashboards. Set up alerts for job failures or performance issues. See Monitor a job. |
| Multi-cloud support | Run jobs with consistent APIs and configurations on AWS, Azure, Google Cloud, neoclouds, or Anyscale-hosted infrastructure. |
Getting started
To run your first job, see Get started with jobs.
For detailed information on creating and managing jobs, see Create and manage jobs.
Best practices
- Configure appropriate retry policies for fault tolerance. See Create and manage jobs.
- Avoid scheduling Ray tasks on the head node for compute-intensive workloads. See Control head node scheduling.
- Monitor job execution with metrics and logs for effective debugging. See Monitor a job.
Pricing
Jobs use standard Anyscale pricing based on the type of machines used. See the Anyscale pricing page.