Workspace Snapshots
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
The Workspace Snapshot feature ensures the persistence of files and folders within your project directory (/home/ray/<your_project_name>
) across restarts. This functionality is designed to maintain project continuity and facilitate seamless transitions between workspace sessions.
Overview
- Periodic Snapshots: Workspaces automatically take snapshots of files and folders at regular intervals to preserve their state. Snapshots occur every 5 minutes.
- Persistence Rules: Files within the project directory are persisted across workspace restarts, excluding those specified in
.gitignore
or.anyscaleignore
.
The snapshot captures the entirety of file contents rather than the file differences.
Excluding files with .anyscaleignore
To exclude specific files or folders from workspace snapshots, you can create a file named .anyscaleignore
in the project directory (~/<project-name>
or ~/default
by default) and specify the items to be excluded. The .anyscaleignore
file supports the following patterns to match files and folders:
# .anyscaleignore example
*.txt # Ignore files with a .txt extension at the same level as `.anyscaleignore`.
**/*.txt # Ignore files with a .txt extension in ANY directory.
folder/ # Ignore all files under "folder/". The slash at the end is optional.
folder/*.txt # Ignore files with a .txt extension under "folder/".
path/to/filename.py # Ignore a specific file by providing its relative path.
file_[1,2].txt # Ignore file_1.txt and file_2.txt
The .anyscaleignore
file supports a subset of patterns from .gitignore
. However, some patterns, like negation and \
escaping, aren't supported. Additionally, *.txt
only matches files at the same level as .anyscaleignore
, which differs from .gitignore
syntax. For further details, see the gitignore documentation.
Snapshot limits
-
Location: Files on the head node file system outside the
/home/ray/default
directory aren't tracked and are lost after the workspace terminates. -
Git repository: Workspaces support the persistence of Git repositories and Git submodules, but not nested Git repositories. If you have nested Git repositories, the inner repositories aren't persisted across workspace restarts.
-
Timeout: Snapshots are subject to a timeout period of 4 minutes to ensure that workspaces are not blocked. The backup capacity is calculated based on the 4-minute timeout period. If the snapshotting process exceeds the timeout period, the snapshot is likely to be aborted.
-
Capacity: The snapshot functionality supports backing up approximately 10 GB of data. If the data exceeds the 10 GB limit, there's a risk of potential data loss across workspace restarts. The error banner below will be displayed to bring awareness.
Storage Suggestions: For data exceeding the 10 GB limit, we strongly recommend utilizing alternative storage solutions such as NFS or object storage services like S3. Refer to our document for an in-depth exploration of different storage options suitable for diverse workspace needs.