Files
Anyscale Workspaces persist files and folders within your project directory, /home/ray/default
, across restarts. This capability maintains project continuity and facilitates seamless transitions between workspace sessions.
For performance reasons, Anyscale limits snapshots to 10 GB per workspace. You can define exclusion rules to ignore data that doesn't need to persist across sessions.
Concepts
- Periodic Snapshots: Workspaces automatically snapshot files and folders at regular intervals to preserve their state. Snapshots occur every 5 minutes.
- Persistence Rules: Files within the project directory persist across workspace restarts, excluding those specified in
.gitignore
or.anyscaleignore
.
Excluding files with .gitgnore
When creating workspace snapshots, Anyscale ignores files in .gitignore
by default. If you want to persist these files across workspace restarts, set the environment variable ANYSCALE_DISABLE_GITIGNORE_EXCLUSION=1
. Anyscale then relies only on .anyscaleignore
for file exclusion.
Excluding files with .anyscaleignore
To exclude specific files or folders from workspace snapshots, create a file named .anyscaleignore
in the project directory ~/default
and specify the items you want to exclude. The .anyscaleignore
file supports the following patterns to match files and folders:
# .anyscaleignore example
*.txt # Ignore files with a .txt extension at the same level as `.anyscaleignore`.
**/*.txt # Ignore files with a .txt extension in ANY directory.
folder/ # Ignore all files under "folder/". The slash at the end is optional.
folder/*.txt # Ignore files with a .txt extension under "folder/".
path/to/filename.py # Ignore a specific file by providing its relative path.
file_[1,2].txt # Ignore file_1.txt and file_2.txt.
The .anyscaleignore
file supports a subset of patterns from .gitignore
. However, some patterns, like negation and \
escaping, aren't supported. Additionally, *.txt
only matches files at the same level as .anyscaleignore
, which differs from .gitignore
syntax. For further details, see the gitignore documentation.
Snapshot limits
-
Location: Files on the head node file system outside the
/home/ray/default
directory aren't tracked and are lost after the workspace terminates. -
Git repository: Workspaces support the persistence of Git repositories and Git submodules, but not nested Git repositories. If you have nested Git repositories, the inner repositories aren't persisted across workspace restarts.
-
Timeout: Snapshots are subject to a timeout period of 4 minutes to ensure that workspaces aren't blocked. Anyscale calculates the backup capacity based on the 4-minute timeout period. If the snapshotting process exceeds the timeout period, Anyscale aborts the snapshot.
-
Capacity: The snapshot capability supports backing up approximately 10 GB of data. If the data exceeds the 10 GB limit, there's a risk of potential data loss across workspace restarts. The error banner below displays to bring awareness.
Storage Suggestions: For data exceeding the 10 GB limit, use alternative storage solutions such as Amazon S3. See Storage and file management for an in-depth exploration of different storage options.