Skip to content

Fix: dictionary changed size during iteration in GCSObjectMetadataClient #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 18, 2025

Conversation

ikyasam18
Copy link
Contributor

The issue was in the _adjust_gcs_metadata_limit_size method where labels
dictionary was being modified while iterating through it. Fixed by:

  1. Creating a list of keys to remove first
  2. Removing the keys after completing the iteration

This prevents the "dictionary changed size during iteration" error that
occurred when large metadata needed to be adjusted to fit GCS size limits.

@ikyasam18 ikyasam18 requested a review from hiro-o918 as a code owner June 16, 2025 12:06
@hiro-o918 hiro-o918 requested a review from Copilot June 16, 2025 12:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a crash in _adjust_gcs_metadata_limit_size by avoiding in-place dict modification during iteration and adds a test to verify metadata trimming.

  • Accumulates keys to remove before deleting them to prevent “dictionary changed size during iteration” errors
  • Adds a unit test to confirm total metadata size remains within GCS limits

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
gokart/gcs_obj_metadata_client.py Change loop to collect and then remove keys
test/test_gcs_obj_metadata_client.py New test ensuring large metadata is truncated
Comments suppressed due to low confidence (3)

test/test_gcs_obj_metadata_client.py:143

  • [nitpick] The test name suggests an expected runtime error but actually checks size truncation. Rename it to something like test_adjust_gcs_metadata_limit_size_truncates_labels for clarity.
def test_adjust_gcs_metadata_limit_size_runtime_error(self):

test/test_gcs_obj_metadata_client.py:151

  • Consider adding assertions to verify that specific keys beyond the size limit were removed, not just the total size, for a more robust test.
self.assertLessEqual(total_size, 8 * 1024)

gokart/gcs_obj_metadata_client.py:159

  • This method mutates the labels dict in-place and returns it. To avoid unexpected side effects, consider returning a new dict or documenting the in-place behavior in the docstring.
def _get_label_size(label_name: str, label_value: str) -> int:

result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels)

total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items())
self.assertLessEqual(total_size, 8 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define a constant value in gokart/gcs_obj_metadata_client.py

MAX_GCS_METADATA_SIZE: Final[int] = 8 * 1024

and refer here and L157 of gokart/gcs_obj_metadata_client.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. I've fixed it!

Copy link
Member

@kitagry kitagry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks


result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels)

total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is encode('utf-8') needs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working with non-ASCII characters (such as Japanese etc.), it is necessary because the number of characters and the number of bytes do not match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. But, in this test case, is it need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize. I misunderstood it as production code. Since this test only needs to verify that no error occurs, it doesn't seem necessary.

Copy link
Contributor

@hiro-o918 hiro-o918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kitagry kitagry merged commit 8cf6e6e into m3dev:master Jun 18, 2025
8 checks passed
@kitagry
Copy link
Member

kitagry commented Jun 18, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants