Skip to main content
Version: Latest

Workspace files

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

Anyscale Workspaces persist files and folders within your project directory, /home/ray/default, across restarts. This capability maintains project continuity and facilitates seamless transitions between workspace sessions.

For performance reasons, Anyscale limits snapshots to 10 GB per workspace. You can define exclusion rules to ignore data that doesn't need to persist across sessions.

Concepts

  • Periodic Snapshots: Workspaces automatically snapshots files and folders at regular intervals to preserve their state. Snapshots occur every 5 minutes.
  • Persistence Rules: Files within the project directory persist across workspace restarts, excluding those specified in .gitignore or .anyscaleignore.

Excluding files with .anyscaleignore

To exclude specific files or folders from workspace snapshots, create a file named .anyscaleignore in the project directory ~/default and specify the items you want to exclude. The .anyscaleignore file supports the following patterns to match files and folders:

# .anyscaleignore example
*.txt # Ignore files with a .txt extension in the working directory.
**/*.txt # Ignore files with a .txt extension in ANY directory.
folder/ # Ignore all files under "folder/". The slash at the end is optional.
folder/*.txt # Ignore files with a .txt extension under "folder/".
path/to/filename.py # Ignore a specific file by providing its relative path.
file_[1,2].txt # Ignore file_1.txt and file_2.txt.
note

The .anyscaleignore file supports a subset of patterns from .gitignore. However, some patterns like negation and \ escaping aren't supported. For further details, see the gitignore documentation.

Snapshot limits

  • Location: Files on the head node file system outside the /home/ray/default directory aren't tracked and are lost after the workspace terminates.

  • Git repository: Workspaces support the persistence of Git repositories and Git submodules, but not nested Git repositories. If you have nested Git repositories, the inner repositories aren't persisted across workspace restarts.

  • Timeout: Snapshots are subject to a timeout period of 4 minutes to ensure that workspaces aren't blocked. Anyscale calculates the backup capacity based on the 4-minute timeout period. If the snapshotting process exceeds the timeout period, Anyscale aborts the snapshot.

  • Capacity: The snapshot capability supports backing up approximately 10 GB of data. If the data exceeds the 10 GB limit, there's a risk of potential data loss across workspace restarts. The error banner below displays to bring awareness.

Example Banner

info

Storage Suggestions: For data exceeding the 10 GB limit, use alternative storage solutions such as Amazon S3. See Storage and file management for an in-depth exploration of different storage options.