-
Couldn't load subscription status.
- Fork 14
fix(cli): #1585 retry file copy on transient Windows file locks (EBUSY/EPERM) #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(cli): #1585 retry file copy on transient Windows file locks (EBUSY/EPERM) #1590
Conversation
|
Thanks @SiddarthaKarri , appreciate the PR and apologies in the delay for reviewing this. Aside from some of the feedback provided by @KaiPrince , one thing I was mostly curious from the original issue was the "why", so to speak. Is the issue that Windows can't handle so many file copies at once? For the project being built here in our CI, I don't think it's that many files? copying file... .greenwood/manifest.json
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/webcomponents-loader.js
copying directory... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/
copying file... /home/runner/work/greenwood/greenwood/node_modules/lit/polyfill-support.js
copying directory... src/assets/
copying file... src/favicon.ico
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-ce.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-ce.js.map
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-pf_dom.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-pf_dom.js.map
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-pf_js.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-pf_js.js.map
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd-ce-pf.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd-ce-pf.js.map
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd-ce.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd-ce.js.map
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd.js
copying file... /home/runner/work/greenwood/greenwood/node_modules/@webcomponents/webcomponentsjs/bundles/webcomponents-sd.js.map
copying file... src/assets/evergreen.svg
copying file... src/assets/getting-started-netlify-config.png
copying file... src/assets/getting-started-repo-styled.png
copying file... src/assets/getting-started-repo-unstyled-partial.png
copying file... src/assets/gh-pages-branch-commits.png
copying file... src/assets/gh-pages-branch.png
copying file... src/assets/graphql-playground.png
copying file... src/assets/greenwood-getting-started-repo-optimized.webp
copying file... src/assets/greenwood-logo-1000w.webp
copying file... src/assets/greenwood-logo-1500w.webp
copying file... src/assets/greenwood-logo-300w.webp
copying file... src/assets/greenwood-logo-500w.webp
copying file... src/assets/greenwood-logo-750w.webp
copying file... src/assets/greenwood-logo-og.png
copying file... src/assets/greenwood-starter-presentation.png
copying file... src/assets/link.png
copying file... src/assets/netlify-admin.png
copying file... src/assets/netlify-cms.jpg
copying file... src/assets/netlify-create-new.png
copying file... src/assets/netlify-deploy.png
copying file... src/assets/netlify-git-gateway.png
copying file... src/assets/netlify-invite.png
copying file... src/assets/netlify-registration.png
copying file... src/assets/netlify-workflow.png
copying file... src/assets/nodejs.png
copying file... src/assets/repo-github-pages-config.png
copying file... src/assets/serverless.webp
copying file... src/assets/simple.png
copying file... src/assets/web-components-browser-support.png
copying file... src/assets/webcomponents.jpg
copying file... src/assets/fonts/source-sans-pro-v13-latin-regular.ttf
copying file... src/assets/fonts/source-sans-pro-v13-latin-regular.woff
copying file... src/assets/fonts/source-sans-pro-v13-latin-regular.woff2
copying file... src/assets/fonts/source-sans-pro.css
copying file... src/assets/blog-images/dev-cache-step1.png
copying file... src/assets/blog-images/dev-cache-step2.png
copying file... src/assets/blog-images/dev-cache-step3.png
copying file... src/assets/blog-images/dev-cache-step4.png
copying file... src/assets/blog-images/full-stack-web-components.webp
copying file... src/assets/blog-images/hud.png
copying file... src/assets/blog-images/init-scaffolding.png
copying file... src/assets/blog-images/wcc-logo.png
copying file... src/assets/blog-images/not-found.png
copying file... src/assets/blog-images/ssr.webp
So what limit / threshold are we hitting, and if it's a Node issue or Windows issue? (docs / links of others having a similar issue would be great here!) I ask because it will help give me context on if the solution here is an appropriate one for the given problem. I understand the PR is implementing some retry logic, but is there a way to avoid the original issue altogether; maybe copying less files at once or something else? Maybe just making copying synchronous (again)? 🤔 Thanks! |
|
Thanks for the feedback — I pushed an update. I removed the busy-wait, restored init’s sync copy, and added bounded-concurrency to the copy lifecycle (files=8, locations=4, plugins=2) while keeping an async retry wrapper for transient EBUSY/EPERM errors. Could someone re-run the Windows CI for this branch? If flakes persist I’ll add verbose retry logging or convert the CSS bundler to async next. |
|
For debugging CI, I recommend opening a dummy PR in your fork repo so you can push commits and run CI as much as you need. |
|
Just to gently re-iterate, I would still like to get more context / information on the underlying root cause before getting to caught up in the implementation / running CI. |
… to reduce intermittent EBUSY on Windows CI
…BUSY/EPERM flakes (issue ProjectEvergreen#1585)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I have missed it somewhere in the comments, I still don't see any insight into why the issue may be happening in the first place, which I would still like to get a better understanding of first before going back and forth on solutions here, since without a firm understanding of why, I think any solution would just be in "spirit" only.
To add some food for thought, one thing I would like to know is why are we getting this on a copy operation? Is it because we copy files async and thus they all copy "out of order" and so when each invocation tries to make it's output directory, that's what causes the issue?
Would making it sync (like it was before) be a solution? That said, I think this issue pre-dates that refactor though.
So yeah, until we can establish a better understanding of the root cause of this issue, I'm going to put this PR into draft until then.
Thanks again for getting things started, let us know if you have any questions and always happy to help guide the conversation. (feel free to join the Discord!)
| after(async function () { | ||
| runner.stopCommand(); | ||
| runner.teardown([initOutputPath]); | ||
| // give the process a moment to release file handles (helps on Windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this, this should be taken care of by #1573 which includes a fix to the test runner around teardown exit conditions
Related Issue
Resolves #1585
Summary of Changes
This PR reduces intermittent Windows CI failures caused by transient file-lock/resource-locked errors during file copy operations (EBUSY / EPERM / EACCES) by adding retry logic to the primary copy code paths.
What I changed
copyFileWithRetry(source, target, { attempts, baseDelay })copyFileWithRetryfor file copies in the recursive copy lifecycle; improved copy failure logging.copyFileWithRetrywhen copying resolved assets into the output (call is non-blocking from the synchronous bundler; errors are logged).copyFileSyncWithRetryto retry synchronous template copy operations (keepscopyTemplatesync).Checklist
Files changed
copyFileSyncWithRetryand used it