Skip to content

feat: parallel loop running based on asyncio #932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jun 12, 2025
Merged

feat: parallel loop running based on asyncio #932

merged 48 commits into from
Jun 12, 2025

Conversation

you-n-g
Copy link
Contributor

@you-n-g you-n-g commented Jun 4, 2025

Description

Motivation and Context

How Has This Been Tested?

  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

📚 Documentation preview 📚: https://RDAgent--932.org.readthedocs.build/en/932/

@you-n-g you-n-g marked this pull request as ready for review June 6, 2025 15:09
@you-n-g you-n-g merged commit c63e207 into main Jun 12, 2025
9 checks passed
@you-n-g you-n-g deleted the multi-proc branch June 12, 2025 03:44
WinstonLiyt pushed a commit that referenced this pull request Jun 16, 2025
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <[email protected]>
qew21 pushed a commit that referenced this pull request Jun 16, 2025
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant