Skip to content

Conversation

jianjunzhong
Copy link

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@0oshowero0 0oshowero0 requested a review from Copilot September 26, 2025 01:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds core metadata structures for handling data batches and fields in the transfer queue system, along with terminology consistency improvements and comprehensive test coverage for storage functionality.

  • Introduces metadata.py with data structures for managing field, sample, and batch metadata
  • Updates variable naming from "per_tensor" to "per_field" for clarity and consistency across storage and controller components
  • Adds extensive test coverage for the simple storage unit with various data types and client scenarios

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
verl/experimental/transfer_queue/metadata.py New file containing metadata classes (FieldMeta, SampleMeta, BatchMeta, StorageMetaGroup) for structured data handling
verl/experimental/transfer_queue/storage.py Updates variable names and comments from "per_tensor" to "per_field" terminology
verl/experimental/transfer_queue/controller.py Updates method names and variable types to use "per_field" terminology consistently
verl/experimental/transfer_queue/client.py Updates property access from deprecated fields to field_names
tests/experimental/transfer_queue/test_simple_storage_unit.py New comprehensive test suite covering storage operations with various data types and client scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

object.__setattr__(self, "_local_indexes", [sample.local_index for sample in self.samples])
object.__setattr__(self, "_storage_ids", [sample.storage_id for sample in self.samples])

# assume all samples have the same fields.
Copy link
Preview

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption should be documented more prominently, possibly as a class docstring note or method parameter documentation, as it's a critical constraint for the BatchMeta functionality.

Copilot uses AI. Check for mistakes.

Comment on lines +330 to +331
log_probs_tensor = torch.randn(32768)
rewards_tensor = torch.randn(32768)
Copy link
Preview

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The tensor size 32768 appears as a magic number. Consider defining it as a constant (e.g., LARGE_TENSOR_SIZE = 32768) for better maintainability.

Copilot uses AI. Check for mistakes.

@0oshowero0 0oshowero0 merged commit 01bef2a into TransferQueue:main Sep 26, 2025
0oshowero0 added a commit that referenced this pull request Sep 28, 2025
* Add metadata.py and test_simple_storage_unit.py

* Add copyright and license information to test_simple_storage_unit.py

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Han Zhenyu 韩振宇 <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants