Add basic enum support. #82

tokarenko · 2025-01-27T19:27:19Z

Please review this pull request to close the issue of Support constants in JSON schema #81

wolverdude

Thanks for doing this!

We'll need to add docs for this behavior. I can polish them, but if you can at least get them started, that would be appreciated.

wolverdude · 2025-01-27T20:56:53Z

genson/schema/node.py

-            active_strategy = self._active_strategies[0]
-            return active_strategy
+        if kind == "schema":
+            for special_strategy in [Enum, Typeless]:


SchemaNode shouldn't need to be modified as Enum doesn't need to (in fact shouldn't) be a special strategy. Just add it to BASIC_SCHEMA_STRATEGIES instead at the head of the list.

This will mean that enum and type will never exist in the same schema even though they technically could, but can add a disclaimer for that in the docs. You will only end up with an enum schema if you start with (or add) one, so presumably most users who try this will know what they're doing.

I rolled back all the changes to node.py and added Enum into the BASIC_SCHEMA_STRATEGIES. Adding the Enum at the head of the list breaks the tests. All the tests pass if the Enum is added at the tail of the list.

I see. I realize now that I misunderstood the complexity of this. Getting enum to behave the way that I said in my earlier comments would actually break backwards compatibility. This is because there's no distinction between matching when you do vs. don't have an existing schema, and this actually would require different behavior in each case.

So your original implementation is correct, including the throwing exceptions, and possibly even designating this as a "special" strategy, though I think it's still probably cleaner just to put it at the end of the list.

Actually, I think there is a way, but it's a bit hacky:

Basically, you create a @classmethod match_object() that always returns False. But there's another instance method _instance_match_object() that always returns True. In __init__(), you reassign self.match_object = self._instance_match_object().

That way you can get different behavior if the SchemaStrategy already exists vs. not and so there's no way an Enum will get created unless it was explicitly asked for by a schema, and this behavior is not order- or type-dependent, so you can give enum preference over other strategies and raise an error if its misused.

Great idea. Done.

genson/schema/strategies/enum.py

wolverdude · 2025-01-27T21:05:09Z

test/test_seed_schema.py

                           'required': ['a']})
+
+    def test_enum(self):
+        self.add_schema({'type': 'object',


You could simplify this test by making enum the top-level schema.

wolverdude · 2025-01-27T21:05:49Z

test/test_seed_schema.py

                           'patternProperties': {r'^\d$': {'type': 'integer'}},
                           'required': ['a']})
+
+    def test_enum(self):


You'll want to add another test that checks what happens when complex objects are added.

genson/schema/strategies/enum.py

wolverdude · 2025-01-28T23:16:51Z

Okay, extra credit idea. You don't have to do this, but I might add it after you're done.

A problem with this implementation is that if given a schema that contains enum and a type, one or the other will be discarded.

This could be fixed by having the SchemaStrategy hold another SchemaNode if it's given a typed schema. Then it would remove enum from the schema and pass it into that subnode and pass on any objects it receives as well. Then when it assembles its output schema, it grabs the output of that subnode and adds its enum key to it.

This might not be a good idea if we expect there to be several ways to combine different schemas like this, but as far as I can tell, this is the only such keyword. Any other combining keywords are specific to one or two schema types and don't cut across all of them like enum.

tokarenko · 2025-01-29T09:37:52Z

It seems that I am done with all of the comments received so far. I would prefer if you could implement "enum and type" case.

wolverdude

Thanks again! I've left a few comments. They're all small and hopefully don't seem too nitpicky, but there are a couple of things that look like bugs that I wanted to point out.

wolverdude · 2025-01-29T15:47:35Z

test/sort.py

@@ -0,0 +1,39 @@
+from sys import intern
+
+class Py2Key:


Where does "Py2" come from in the name? I think it's best to use something more descriptive such as "MultiTypeSortKey"

Changed the name as suggested.

wolverdude · 2025-01-29T15:55:47Z

genson/schema/node.py

        raise SchemaGenerationError(
            'Could not find matching schema type for {0}: {1!r}'.format(
-                kind, schema_or_obj))
+                kind, schema_or_obj))


This codebase uses the default flake8 styles, which includes a newline at EOF. Please run flake8 to find and fix formatting issues.

Fixed linter errors.

wolverdude · 2025-01-29T16:01:37Z

genson/schema/strategies/enum.py

+        # and this behavior is not order- or type-dependent.
+        self.match_object = self._instance_match_object
+
+    @classmethod


Can this also be @staticmethod (or match_schema be @classmethod) for consistency?

Changed to @staticmethod

genson/schema/strategies/enum.py

wolverdude · 2025-01-29T16:09:03Z

test/sort.py

+    def __lt__(self, other):
+        try:
+            return self.value < other.value
+        except TypeError:


What about the case that self and other are of the same type, but that type is not sortable? Maybe we don't expect to encounter that (at least not in tests)? If so, please document this behavior.

Added comment as suggested.

wolverdude · 2025-01-29T16:10:12Z

test/sort.py

+            return self.typestr < other.typestr
+
+
+def sort_lists_in_schema(schema, sorted_key):


Since this is only used in one place and always uses the same sorted_key, why take it as a parameter? Why not just hardcode it and make the interface simpler?

Removed the parameter as suggested.

wolverdude · 2025-01-29T16:14:15Z

genson/schema/strategies/enum.py

+            if item_type in [bool, str, int, float]:
+                self._enum.add(item)
+            elif item is None:
+                self._enum.add("null")


Technically, this adds a string "null", which is not the same as the original object. JSON.dumps will handle the conversion from "None" to "null" for you. Just add this to the scalars list and remove this if case.

Added to scalars as suggested and refactored this code.

wolverdude · 2025-01-29T16:15:58Z

test/test_seed_schema.py

+
+    def test_enum_scalar_list(self):
+        self.add_schema({"enum": []})
+        self.add_object(["123", 1, 1.2, True, None])


Shouldn't this raise an error since it's a list?

Indeed. Refactored this code.

wolverdude · 2025-01-29T16:17:33Z

test/test_seed_schema.py

+        self.add_schema({"enum": []})
+        self.add_object(["123", 1, 1.2, True, None])
+        self.assertResult(
+            {"enum": ["123", 1, 1.2, "null"]},


To get this result, you should have to either start with this schema or add each object individually. To properly test, you should probably do a combination of the two: start with 2 or 3 of the items and then add the rest, making sure you add at least one of each type.

Changed the test as suggested.

wolverdude

There's a couple suggestions left, but lgtm at this point.

Once this is merged, I'll update the readme and add a couple other fixes before releasing the next minor version.

wolverdude · 2025-01-30T08:43:11Z

test/test_seed_schema.py

                           'properties': {'a': {'type': 'boolean'}},
                           'patternProperties': {r'^\d$': {'type': 'integer'}},
                           'required': ['a']})
+


I don't know why this didn't occur to me before, but there should be some test that adds the same item multiple times to make sure it gets deduped.

wolverdude · 2025-01-30T08:45:56Z

genson/schema/strategies/enum.py

+        # Add only scalar types. Technically, the JSON-Schema spec allows
+        # any type in an enum list, but using objects and lists is a very
+        # rare use-case.
+        if obj is not None and type(obj) not in [bool, str, int, float]:


Simplified:

Suggested change

if obj is not None and type(obj) not in [bool, str, int, float]:

if not isinstance(obj, (bool, str, int, float, type(None)):

tokarenko · 2025-02-13T15:15:46Z

@wolverdude , how about supporting list of scalars in Enum?
I realized that I need them for my use case:

class Enum(SchemaStrategy):
...
    def add_object(self, obj):
        super().add_object(obj)
        # Add only scalars and lists of scalars. Technically, the JSON-Schema
        # spec allows any type in an enum list, but using objects is a very
        # rare use-case.
        scalar_types = (bool, str, int, float, type(None))
        if isinstance(obj, scalar_types):
            # Scalar type.
            # Convert to list to unify processing of string and other types.
            self._enum.update([obj])
        elif isinstance(obj, list) \
                and not [item for item in obj
                         if not isinstance(item, scalar_types)]:
            # List of scalar types.
            self._enum.update(obj)
        else:
            raise TypeError(f"Unsupported enum type of {type(obj)}."
                            "Scalar or list of scalars expected.")

wolverdude · 2025-02-13T21:54:18Z

Okay. Glad I didn't get around to merging this yet then. 🙂

To be clear, you need the enum to hold the entire list, not that you can have an array node whose items are an enum, correct?

Nesting a layer deep to check for enum is starting to feel like too much complexity to me. If we support array, why not object too? Or why not look 2 or 3 layers deep in the list? It really depends on the user's specific needs at that point.

Could you implement a custom strategy extending the base Enum instead? All it would need to do is catch the TypeError and then run the extra code you have there. If it isn't a list or they aren't all scalars, you can just re-raise the error.

tokarenko · 2025-02-14T06:28:37Z

I mean merging list of literals into a schema:

self.add_schema({"enum": []})
self.add_object(["a", "b", "c"])

I agree that we should not handle complex types here because of structural complexity.
At the same time, list of scalars seem a reasonable case to support. It allows to add all the scalars at once.
I have no problem implementing custom Enum strategy, if you still find this out of scope.

wolverdude · 2025-02-14T06:50:07Z

I'm still not fully clear here. Let's take your example:

self.add_schema({"enum": []})
self.add_object(["a", "b", "c"])

In my mind, this should output {"enum": [["a", "b", "c"]]}, but based on the code, I think you'd like {"enum": ["a", "b", "c"]}. Is that correct? If so, I don't think we can do that. I don't think the input object would validate under the latter schema since ["a", "b", "c"] isn't among the enum values.

tokarenko · 2025-02-14T07:22:51Z

Oh, I see. Yes, indeed I expected the result to be {"enum": ["a", "b", "c"]}. So, there is no theoretically valid way of populating the schema the way I want?

wolverdude · 2025-02-14T08:11:06Z

I don't think so. If you want to be sure, run it against a schema validator.

Even if not, you can always create a custom SchemaStrategy to behave any way you like.

tokarenko · 2025-02-17T11:55:45Z

@wolverdude , while waiting for the merge I tried to install from local fork at GitLab. I found that schema subdirectory is not installed:

.venv/lib/python3.11/site-packages/genson/__init__.py:1: in <module>
    from .schema.builder import SchemaBuilder, Schema
E   ModuleNotFoundError: No module named 'genson.schema'

To fix this I added the following line to MANIFEST.in:
recursive-include genson *

wolverdude · 2025-02-18T01:06:34Z

That's probably the problem behind issue #80. Thanks for letting me know!

tokarenko · 2025-08-31T13:57:51Z

Hi Jon (@wolverdude)! I would be grateful if you could merge this PR along with changes to MANIFEST.in. I am repackaging my projects and I would like to dismiss my own fork of genson.

wolverdude · 2025-09-03T06:47:10Z

Sorry, I'll try to get that done this week.

wolverdude · 2025-09-07T04:19:57Z

@tokarenko the code here has some errors that prevent it from running. I can fix these myself, but if you've been using this feature already, I would prefer to use battle-tested code here. Are there some commits you haven't pushed?

tokarenko · 2025-10-05T11:15:21Z

Hi Jon ( @wolverdude )! I was on vacation. I merged code from my fork. Please review it and merge.

tokarenko added 2 commits January 27, 2025 22:16

Add basic enum support.

63c2214

Integrate PR of "Add Enum schema strategy" wolverdude#57

58dd599

wolverdude requested changes Jan 27, 2025

View reviewed changes

tokarenko added 2 commits January 29, 2025 00:51

Fix Enum strategy, add tests, and other PR actions

b47a267

Add Enum to BASIC_SCHEMA_STRATEGIES

402d816

Add Enum strategy matching only if schema exists.

19c43f4

wolverdude requested changes Jan 29, 2025

View reviewed changes

Fix linting errors and PR review.

5b84086

wolverdude approved these changes Jan 30, 2025

View reviewed changes

wolverdude mentioned this pull request Feb 18, 2025

Install genson/schema/ sub-directory #80

Closed

Add test, fix install, simplify enum add_object

0000b7d

		return self.typestr < other.typestr


		def sort_lists_in_schema(schema, sorted_key):

	if obj is not None and type(obj) not in [bool, str, int, float]:
	if not isinstance(obj, (bool, str, int, float, type(None)):

Add basic enum support. #82

Are you sure you want to change the base?

Add basic enum support. #82

Conversation

tokarenko commented Jan 27, 2025

Uh oh!

wolverdude left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolverdude Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wolverdude commented Jan 28, 2025

Uh oh!

tokarenko commented Jan 29, 2025

Uh oh!

wolverdude left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolverdude left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tokarenko commented Feb 13, 2025

wolverdude Jan 28, 2025 •

edited

Loading

wolverdude commented Feb 14, 2025 •

edited

Loading