Skip to content

Convert app import/export to be fully streamed end to end. #16653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Aug 7, 2025
Merged

Conversation

samwho
Copy link
Contributor

@samwho samwho commented Jul 30, 2025

Description

We see very erratic memory and CPU usage in production and I spent time combing through the data and found that there is a correlation between the higher spikes and requests to export or import an app. Looking at the code, I think it's possible for the exports to be done entirely in memory.

This PR rewrites the import/export subsystems to be streamed end-to-end, in the hope that this lessens the memory spikes. I suspect CPU spikes may persist because compression is inherently CPU itensive. That said, if we stream direct to the response that means the stream will be constrained on the receiving bandwidth, so the CPU could be less.

Feature branch env

Feature Branch Link

samwho and others added 3 commits July 29, 2025 17:38
- Replace tar package (6.2.1) with tar-stream (3.1.7) for memory-efficient processing
- Upgrade tar-fs from 2.1.2 to 2.1.3 to fix security vulnerability
- Implement streaming tar creation/extraction to prevent memory buffering
- Update backup exports, imports, and CLI to use streaming TAR operations
- Resolves 100% CPU usage and linear memory growth during backup operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Implement streamExportAppDirect() that streams tar directly to HTTP response
- Eliminates intermediate tar file creation reducing disk I/O and memory usage
- Uses tar-stream → gzip → PassThrough pipeline for efficient streaming
- Update backup controller to use direct streaming method
- Fix TypeScript types: ReadStream → NodeJS.ReadableStream for broader compatibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

qa-wolf bot commented Jul 30, 2025

QA Wolf here! As you write new code it's important that your test coverage is keeping up.
Click here to request test coverage for this PR!

@github-actions github-actions bot added firestorm Data/Infra/Revenue Team size/m labels Jul 30, 2025
samwho and others added 3 commits July 30, 2025 16:01
- Add streaming encryption transform using AES-256-CTR cipher
- Replicate existing encryption logic but as stream transforms
- Support both encrypted and unencrypted streaming exports
- Stream files directly from object store to tar without temp files
- Stream database export directly without intermediate files
- Maintain full backward compatibility with existing encryption format
- Eliminate all temporary file usage during export process

This achieves the goal of fully streaming exports with no disk I/O for temp files while maintaining encryption compatibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@samwho samwho changed the title Tar cpu Convert app import/export to be fully streamed end to end. Aug 6, 2025
@samwho samwho added the feature-branch Release this PR code into a feature branch label Aug 6, 2025
@samwho samwho marked this pull request as ready for review August 7, 2025 09:15
@samwho samwho requested a review from a team as a code owner August 7, 2025 09:15
@samwho samwho requested review from mike12345567 and removed request for a team August 7, 2025 09:15
}

try {
await objectStore.streamUpload({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the imports were handled in parallel with the promise.push - realistically that could kick off too much anyway, but should we do something similar to the export and have some level of parallelization to avoid large imports being very slow?

@samwho samwho enabled auto-merge August 7, 2025 14:11
@samwho samwho merged commit e8d9a59 into master Aug 7, 2025
44 of 49 checks passed
@samwho samwho deleted the tar-cpu branch August 7, 2025 14:29
@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2025
@mike12345567 mike12345567 restored the tar-cpu branch August 7, 2025 15:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-branch Release this PR code into a feature branch firestorm Data/Infra/Revenue Team size/l size/m size/xl
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants