RFC: explicitly set strict flag in `zip` calls (enforce `B905`), and ignore `UP038` #4940

neutrinoceros · 2024-07-07T08:08:58Z

This is a mostly automated style change, but I'm using the test suite to detect places where strict=False is either the intended behavior or a bug.

I'll also carefully review this myself to improve coverage.

neutrinoceros · 2024-07-07T08:14:23Z

yt/data_objects/profiles.py

        kwargs["deposition"] = deposition
    if override_bins is not None:
-        for o_bin, ax in zip(o_bins, ["x", "y", "z"]):
+        for o_bin, ax in zip(o_bins, ["x", "y", "z"], strict=False):


This is one very rare case where it's apparent that strict=False is the intended behavior (found thanks to a test).

Great catch. Yes, I think this is assuming that it'll run out of axes based on the dimensionality.

neutrinoceros · 2024-07-07T08:23:20Z

yt/data_objects/analyzer_objects.py

        if len(args) + len(kwargs) != len(self._params):
            raise RuntimeError
-        self.__dict__.update(zip(self._params, args))
+        self.__dict__.update(zip(self._params, args, strict=False))


from the context, it seems that strict=False is clearly intended here

neutrinoceros · 2024-07-07T09:49:45Z

yt/data_objects/derived_quantities.py

        return [
            self.data_source.ds.arr([mis.min(), mas.max()])
-            for mis, mas in zip(values[::2], values[1::2])
+            for mis, mas in zip(values[::2], values[1::2], strict=True)


This is only safe if len(values)%2==0 is guaranteed. If not, strict=False matches the existing behavior but is possibly buggy (one value is silently dropped)

I think using strict here is the right move, as a check on argument inputs.

neutrinoceros · 2024-07-07T09:52:22Z

On second though this isn't a pure stylistic choice, as it may (however unlikely) have undesirable effects. Re-triaging as a refactor.

neutrinoceros · 2024-07-07T10:24:46Z

yt/frontends/stream/data_structures.py

        )
        cgs_units = ("cm", "g", "s", "cm/s", "gauss")
-        for unit, attr, cgs_unit in zip(base_units, attrs, cgs_units):
+        for unit, attr, cgs_unit in zip(base_units, attrs, cgs_units, strict=True):


this is a good example of why enforcing strict=True helps finding bugs

neutrinoceros · 2024-07-07T12:54:49Z

yt/data_objects/profiles.py

+            if wf is not None:
+                is_local.append(wf.sampling_type == "local")


highlighting this bit as a bugfix

Thank you. I have an inkling somewhere in my head that there's an extant issue this could be related to but searching does not prove me correct.

neutrinoceros · 2024-07-07T13:07:04Z

yt/frontends/athena/io.py

        keys = fhandle["field_types"].keys()
-        val = fhandle["field_types"].keys()
-        return dict(zip(keys, val))
+        val = fhandle["field_types"].values()


this is a case where the original code just looks wrong, but it's been there for 12 years so either it's not really a bug (but it should have a comment), or it's actually not used by anyone...

I don't see any references to it -- this may be vestigial.

you're right. And it's private anyway so let me just remove it now.

yt/utilities/fortran_utils.py

neutrinoceros

Finishing my self-review with a couple notes for reviewers:
First of, if you're wondering why I write

if sys.version_info >= (3, 10):
    pass
else:
    from yt._maintenance.backports import zip

instead of the simpler

if sys.version_info < (3, 10):
    from yt._maintenance.backports import zip

~~It is because ruff (rule UP036) can automatically cleanup the former but not the latter when we switch python-requires to >=3.10. So it's more verbose now but easier to cleanup in the future.~~
edit July 16th 2024: this is not true anymore, ruff 0.5.2 (and supposedly some older versions too) know how to cleanup both versions (but require --unsafe-fixes in both cases), so I'll simplify this PR.

in a couple places, I actively changed the code around a zip to fix apparent bugs. These should be highlighted by my own comments.
Please pay special attention in cases where I set strict=False: in most cases, I saw no actual test failing with strict=True but the context around zip made me think strict=False was the intended behavior. I may be wrong.
Some zip calls were zipping some small literal lists, and I refactored them to as lists of tuples to avoid zip entirely, which I found was more readable. Maybe I made a mistake in a case or two, so please pay extra attention there too.

neutrinoceros · 2024-07-16T12:49:43Z

I know this isn't exactly an exciting read, but can I ask @matthewturk, @chrishavlin or @cphyc for a review ?

chrishavlin · 2024-07-16T13:46:34Z

Yes! I actually started it a few days ago but didn't finish up. Can definitely get to it in the next few days.

matthewturk

I left some comments where I think there might be issues, but I want to note that I think there are no problems with any of this, and any bugs it finds would be an improvement. Thanks for taking the time and leaving such helpful comments on it.

matthewturk · 2024-07-16T17:01:56Z

yt/data_objects/data_containers.py

            unambiguous_fields_to_include = []
            unambiguous_fields_units = []
-            for field, field_unit in zip(fields_to_include, fields_units):
+            for field, field_unit in zip(fields_to_include, fields_units, strict=True):


If an error were to occur, I think this is a candidate location.

~30 lines prior to this, an error is raised `if len(fields_units) != len(fields_to_include):, so I think this should be OK

matthewturk · 2024-07-16T17:02:01Z

yt/data_objects/data_containers.py


            ## explicitly go after the fields we want
-            for field, units in zip(fields_to_include, fields_units):
+            for field, units in zip(fields_to_include, fields_units, strict=True):


If an error were to occur, I think this is a candidate location.

matthewturk · 2024-07-16T17:02:47Z

yt/data_objects/derived_quantities.py

        return [
            self.data_source.ds.arr([mis.min(), mas.max()])
-            for mis, mas in zip(values[::2], values[1::2])
+            for mis, mas in zip(values[::2], values[1::2], strict=True)


I think using strict here is the right move, as a check on argument inputs.

matthewturk · 2024-07-16T17:03:51Z

yt/data_objects/profiles.py

+            if wf is not None:
+                is_local.append(wf.sampling_type == "local")


Thank you. I have an inkling somewhere in my head that there's an extant issue this could be related to but searching does not prove me correct.

matthewturk · 2024-07-16T17:04:27Z

yt/data_objects/profiles.py

        kwargs["deposition"] = deposition
    if override_bins is not None:
-        for o_bin, ax in zip(o_bins, ["x", "y", "z"]):
+        for o_bin, ax in zip(o_bins, ["x", "y", "z"], strict=False):


Great catch. Yes, I think this is assuming that it'll run out of axes based on the dimensionality.

matthewturk · 2024-07-16T17:09:55Z

yt/frontends/enzo_e/data_structures.py

        else:
            p = self.parameters
-            for d, u in zip(("length", "time"), ("cm", "s")):
+            for d, u in [("length", "cm"), ("time", "s")]:


I can't help but feel like I probably initiated this idiom that you're undoing in several places ... and I think undoing it is the right thing!

matthewturk · 2024-07-16T17:10:15Z

yt/frontends/enzo_e/misc.py

 def get_child_index(anc_id, desc_id):
    cid = ""
-    for aind, dind in zip(anc_id.split("_"), desc_id.split("_")):
+    for aind, dind in zip(anc_id.split("_"), desc_id.split("_"), strict=True):


if this errors it will be a very valuable error check

matthewturk · 2024-07-16T17:12:08Z

yt/frontends/tipsy/data_structures.py

            a: c
            for (a, b), c in zip(
-                self._header_spec, struct.unpack(hh, f.read(struct.calcsize(hh)))
+                self._header_spec,


This one I'm less certain of, in that it may be the case that we don't want to force the specification of all fields and just let some go. But, that's not necessarily what we should build in to doing this check, so 👍 .

matthewturk · 2024-07-16T17:12:49Z

yt/utilities/answer_testing/framework.py


    def compare(self, new_result, old_result):
-        for newp, oldp in zip(new_result["parents"], old_result["parents"]):
+        for newp, oldp in zip(


this is a good additional check

although, it may result in slightly less informative errors if the lengths are different

not sure what you mean here: if lengths are different, there might be no error at all (without strict=True)

Oh, I just mean, that the old way wouldn't pick that up as-is but the new way will show a weird error. It's NBD.

matthewturk · 2024-07-16T17:13:42Z

yt/utilities/fortran_utils.py

-                )
-
-            for k, val in zip(a, v):
+            for k, val in zip(a, v, strict=True):


I do think the other version was somewhat better, as otherwise we're left deducing what caused the iterators to be of different lengths.

FTR I'm waiting on @chrishavlin to weigh in but happy to change it if he agrees

i think i prefer keeping the OSError, but don't feel too strongly about it.

chrishavlin

Generally looks good! some minor questions/remarks so i'll hit approve now (assume approval if it gets dismissed!)

I'll also say I had a hard time evaluating the changes in the code for some of the frontends (particularly where the existing zips related to file structures....), but I think we have to trust (hope?) that the tests would catch errors there.

yt/data_objects/construction_data_containers.py

chrishavlin · 2024-07-22T15:29:34Z

yt/data_objects/data_containers.py

            unambiguous_fields_to_include = []
            unambiguous_fields_units = []
-            for field, field_unit in zip(fields_to_include, fields_units):
+            for field, field_unit in zip(fields_to_include, fields_units, strict=True):


~30 lines prior to this, an error is raised `if len(fields_units) != len(fields_to_include):, so I think this should be OK

yt/data_objects/region_expression.py

yt/frontends/athena/io.py

neutrinoceros · 2024-07-23T08:32:14Z

Can I get another approval ? Seems that I dismissed the ones I got when I pushed another clean up commit

neutrinoceros · 2024-07-27T14:20:17Z

Let's get this in before it starts rotting