-
Notifications
You must be signed in to change notification settings - Fork 63
Fix: dictionary changed size during iteration in GCSObjectMetadataClient #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a crash in _adjust_gcs_metadata_limit_size
by avoiding in-place dict modification during iteration and adds a test to verify metadata trimming.
- Accumulates keys to remove before deleting them to prevent “dictionary changed size during iteration” errors
- Adds a unit test to confirm total metadata size remains within GCS limits
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
gokart/gcs_obj_metadata_client.py | Change loop to collect and then remove keys |
test/test_gcs_obj_metadata_client.py | New test ensuring large metadata is truncated |
Comments suppressed due to low confidence (3)
test/test_gcs_obj_metadata_client.py:143
- [nitpick] The test name suggests an expected runtime error but actually checks size truncation. Rename it to something like
test_adjust_gcs_metadata_limit_size_truncates_labels
for clarity.
def test_adjust_gcs_metadata_limit_size_runtime_error(self):
test/test_gcs_obj_metadata_client.py:151
- Consider adding assertions to verify that specific keys beyond the size limit were removed, not just the total size, for a more robust test.
self.assertLessEqual(total_size, 8 * 1024)
gokart/gcs_obj_metadata_client.py:159
- This method mutates the
labels
dict in-place and returns it. To avoid unexpected side effects, consider returning a new dict or documenting the in-place behavior in the docstring.
def _get_label_size(label_name: str, label_value: str) -> int:
test/test_gcs_obj_metadata_client.py
Outdated
result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels) | ||
|
||
total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items()) | ||
self.assertLessEqual(total_size, 8 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define a constant value in gokart/gcs_obj_metadata_client.py
MAX_GCS_METADATA_SIZE: Final[int] = 8 * 1024
and refer here and L157 of gokart/gcs_obj_metadata_client.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. I've fixed it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
test/test_gcs_obj_metadata_client.py
Outdated
|
||
result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels) | ||
|
||
total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is encode('utf-8')
needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When working with non-ASCII characters (such as Japanese etc.), it is necessary because the number of characters and the number of bytes do not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. But, in this test case, is it need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize. I misunderstood it as production code. Since this test only needs to verify that no error occurs, it doesn't seem necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you! |
The issue was in the _adjust_gcs_metadata_limit_size method where labels
dictionary was being modified while iterating through it. Fixed by:
This prevents the "dictionary changed size during iteration" error that
occurred when large metadata needed to be adjusted to fit GCS size limits.