Skip to content

Conversation

@djeebus
Copy link
Contributor

@djeebus djeebus commented Aug 4, 2025

Early tests show that this can reduce the startup time for sandboxes by ~60% if the sandbox is starting on a fresh node.

Notes:

  • this only stores template slabs in NFS, in order to reduce disk usage and blast radius
  • the launch darkly flags use-nfs-for-snapshots and use-nfs-for-templates control whether we write slabs to NFS for snapshots and templates.

Remaining tasks:

  • Bring back terraform, make sure it is production ready
  • Remove unused files from the NFS mount when appropriate (in progress)
  • Monitor usage
  • Protect against short writes to GCP Object Storage during batched writes

Other items worth noting:

  1. the "build-packages" github action now calculates its cache key based on go.work/go.work.sum, since that collects all dependencies and is simpler than recursively searching the whole repository
  2. there is a new tfvar called "use_filestore_cache" that enables the creation of the filestore (defaults to false)
  3. make plan-only-jobs no longer errors if a change needs to be made, allowing make plan-only-jobs && make apply
  4. make test now tests every package in the go.work file, rather than requiring them to be listed out one at a time
  5. make lint can now be run to lint all the packages
  6. I fixed a race condition in the lifecycle_cache_test.go, revealed when running the tests with -race
  7. I refactored the boolean flags to combine key and default into a single struct. If we like it, we can refactor the int flags similarly.

@ValentaTomas ValentaTomas mentioned this pull request Aug 4, 2025
4 tasks
@ValentaTomas ValentaTomas added the improvement Improvement for current functionality label Aug 7, 2025
@sitole sitole self-requested a review August 8, 2025 15:02
djeebus and others added 7 commits August 18, 2025 12:44
# Conflicts:
#	packages/nomad/main.tf
#	packages/nomad/orchestrator.hcl
#	packages/nomad/template-manager.hcl
#	packages/orchestrator/internal/template/build/hash_index.go
@ValentaTomas
Copy link
Member

Also not sure if https://github.com/e2b-dev/infra/blob/nfs-file-cache/packages/orchestrator/internal/server/sandboxes.go#L423 will the snapshot upload use the cached persistence and therefore the mentioned writes.

@djeebus
Copy link
Contributor Author

djeebus commented Aug 22, 2025

Also not sure if https://github.com/e2b-dev/infra/blob/nfs-file-cache/packages/orchestrator/internal/server/sandboxes.go#L423 will the snapshot upload use the cached persistence and therefore the mentioned writes.

No, it's only used in the GetTemplate function, which is called on Create (line 52 in the same file). The wrapped persistence value only gets used inside GetTemplate and in the Template instance, which isn't used to upload the snapshot.

djeebus and others added 2 commits August 22, 2025 13:29
…-cache

# Conflicts:
#	packages/orchestrator/internal/sandbox/template/cache.go
#	packages/orchestrator/internal/server/sandboxes.go
#	packages/orchestrator/internal/template/build/layer/layer_executor.go
@ValentaTomas
Copy link
Member

ValentaTomas commented Aug 22, 2025

Also not sure if https://github.com/e2b-dev/infra/blob/nfs-file-cache/packages/orchestrator/internal/server/sandboxes.go#L423 will the snapshot upload use the cached persistence and therefore the mentioned writes.

No, it's only used in the GetTemplate function, which is called on Create (line 52 in the same file). The wrapped persistence value only gets used inside GetTemplate and in the Template instance, which isn't used to upload the snapshot.

https://github.com/e2b-dev/infra/pull/955/files/f159ab38864c36203da88527d38221bca8f07c36#diff-39e76d47dde3cef11269255fc539b2b574991b5e59265204deb19d94235ca571L75

Isn't this also using the GetTemplate?

@djeebus
Copy link
Contributor Author

djeebus commented Aug 22, 2025

Yup, it does. The surrounding code seems to use it to pull a layer from GCS, which may cause it to write some cache files to NFS. it might also read from NFS, which might speed up read operations. I'm not sure how likely that scenario is though. It doesn't look like it ever uses the persistence field of the storageTemplate struct for anything other than reads from GCS though. I also ran e2b template build locally and verified via metrics that it doesn't cause any writes to NFS.

@ValentaTomas ValentaTomas merged commit 2c7f69c into main Aug 24, 2025
25 checks passed
@ValentaTomas ValentaTomas deleted the nfs-file-cache branch August 24, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement for current functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants