Skip to content

Conversation

@tokarenko
Copy link

Please review this pull request to close the issue of Support constants in JSON schema #81

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

We'll need to add docs for this behavior. I can polish them, but if you can at least get them started, that would be appreciated.

active_strategy = self._active_strategies[0]
return active_strategy
if kind == "schema":
for special_strategy in [Enum, Typeless]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaNode shouldn't need to be modified as Enum doesn't need to (in fact shouldn't) be a special strategy. Just add it to BASIC_SCHEMA_STRATEGIES instead at the head of the list.

This will mean that enum and type will never exist in the same schema even though they technically could, but can add a disclaimer for that in the docs. You will only end up with an enum schema if you start with (or add) one, so presumably most users who try this will know what they're doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rolled back all the changes to node.py and added Enum into the BASIC_SCHEMA_STRATEGIES. Adding the Enum at the head of the list breaks the tests. All the tests pass if the Enum is added at the tail of the list.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I realize now that I misunderstood the complexity of this. Getting enum to behave the way that I said in my earlier comments would actually break backwards compatibility. This is because there's no distinction between matching when you do vs. don't have an existing schema, and this actually would require different behavior in each case.

So your original implementation is correct, including the throwing exceptions, and possibly even designating this as a "special" strategy, though I think it's still probably cleaner just to put it at the end of the list.

Copy link
Owner

@wolverdude wolverdude Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there is a way, but it's a bit hacky:

Basically, you create a @classmethod match_object() that always returns False. But there's another instance method _instance_match_object() that always returns True. In __init__(), you reassign self.match_object = self._instance_match_object().

That way you can get different behavior if the SchemaStrategy already exists vs. not and so there's no way an Enum will get created unless it was explicitly asked for by a schema, and this behavior is not order- or type-dependent, so you can give enum preference over other strategies and raise an error if its misused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Done.

'required': ['a']})

def test_enum(self):
self.add_schema({'type': 'object',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this test by making enum the top-level schema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'patternProperties': {r'^\d$': {'type': 'integer'}},
'required': ['a']})

def test_enum(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to add another test that checks what happens when complex objects are added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@wolverdude
Copy link
Owner

Okay, extra credit idea. You don't have to do this, but I might add it after you're done.

A problem with this implementation is that if given a schema that contains enum and a type, one or the other will be discarded.

This could be fixed by having the SchemaStrategy hold another SchemaNode if it's given a typed schema. Then it would remove enum from the schema and pass it into that subnode and pass on any objects it receives as well. Then when it assembles its output schema, it grabs the output of that subnode and adds its enum key to it.

This might not be a good idea if we expect there to be several ways to combine different schemas like this, but as far as I can tell, this is the only such keyword. Any other combining keywords are specific to one or two schema types and don't cut across all of them like enum.

@tokarenko
Copy link
Author

It seems that I am done with all of the comments received so far. I would prefer if you could implement "enum and type" case.

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! I've left a few comments. They're all small and hopefully don't seem too nitpicky, but there are a couple of things that look like bugs that I wanted to point out.

test/sort.py Outdated
@@ -0,0 +1,39 @@
from sys import intern

class Py2Key:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does "Py2" come from in the name? I think it's best to use something more descriptive such as "MultiTypeSortKey"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name as suggested.

raise SchemaGenerationError(
'Could not find matching schema type for {0}: {1!r}'.format(
kind, schema_or_obj))
kind, schema_or_obj))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase uses the default flake8 styles, which includes a newline at EOF. Please run flake8 to find and fix formatting issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed linter errors.

# and this behavior is not order- or type-dependent.
self.match_object = self._instance_match_object

@classmethod
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be @staticmethod (or match_schema be @classmethod) for consistency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to @staticmethod

def __lt__(self, other):
try:
return self.value < other.value
except TypeError:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case that self and other are of the same type, but that type is not sortable? Maybe we don't expect to encounter that (at least not in tests)? If so, please document this behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment as suggested.

return self.typestr < other.typestr


def sort_lists_in_schema(schema, sorted_key):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only used in one place and always uses the same sorted_key, why take it as a parameter? Why not just hardcode it and make the interface simpler?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the parameter as suggested.

if item_type in [bool, str, int, float]:
self._enum.add(item)
elif item is None:
self._enum.add("null")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this adds a string "null", which is not the same as the original object. JSON.dumps will handle the conversion from "None" to "null" for you. Just add this to the scalars list and remove this if case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to scalars as suggested and refactored this code.


def test_enum_scalar_list(self):
self.add_schema({"enum": []})
self.add_object(["123", 1, 1.2, True, None])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this raise an error since it's a list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Refactored this code.

self.add_schema({"enum": []})
self.add_object(["123", 1, 1.2, True, None])
self.assertResult(
{"enum": ["123", 1, 1.2, "null"]},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get this result, you should have to either start with this schema or add each object individually. To properly test, you should probably do a combination of the two: start with 2 or 3 of the items and then add the rest, making sure you add at least one of each type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the test as suggested.

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple suggestions left, but lgtm at this point.

Once this is merged, I'll update the readme and add a couple other fixes before releasing the next minor version.

'properties': {'a': {'type': 'boolean'}},
'patternProperties': {r'^\d$': {'type': 'integer'}},
'required': ['a']})

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this didn't occur to me before, but there should be some test that adds the same item multiple times to make sure it gets deduped.

# Add only scalar types. Technically, the JSON-Schema spec allows
# any type in an enum list, but using objects and lists is a very
# rare use-case.
if obj is not None and type(obj) not in [bool, str, int, float]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified:

Suggested change
if obj is not None and type(obj) not in [bool, str, int, float]:
if not isinstance(obj, (bool, str, int, float, type(None)):

@tokarenko
Copy link
Author

@wolverdude , how about supporting list of scalars in Enum?
I realized that I need them for my use case:

class Enum(SchemaStrategy):
...
    def add_object(self, obj):
        super().add_object(obj)
        # Add only scalars and lists of scalars. Technically, the JSON-Schema
        # spec allows any type in an enum list, but using objects is a very
        # rare use-case.
        scalar_types = (bool, str, int, float, type(None))
        if isinstance(obj, scalar_types):
            # Scalar type.
            # Convert to list to unify processing of string and other types.
            self._enum.update([obj])
        elif isinstance(obj, list) \
                and not [item for item in obj
                         if not isinstance(item, scalar_types)]:
            # List of scalar types.
            self._enum.update(obj)
        else:
            raise TypeError(f"Unsupported enum type of {type(obj)}."
                            "Scalar or list of scalars expected.")

@wolverdude
Copy link
Owner

Okay. Glad I didn't get around to merging this yet then. 🙂

To be clear, you need the enum to hold the entire list, not that you can have an array node whose items are an enum, correct?

Nesting a layer deep to check for enum is starting to feel like too much complexity to me. If we support array, why not object too? Or why not look 2 or 3 layers deep in the list? It really depends on the user's specific needs at that point.

Could you implement a custom strategy extending the base Enum instead? All it would need to do is catch the TypeError and then run the extra code you have there. If it isn't a list or they aren't all scalars, you can just re-raise the error.

@tokarenko
Copy link
Author

I mean merging list of literals into a schema:

self.add_schema({"enum": []})
self.add_object(["a", "b", "c"])

I agree that we should not handle complex types here because of structural complexity.
At the same time, list of scalars seem a reasonable case to support. It allows to add all the scalars at once.
I have no problem implementing custom Enum strategy, if you still find this out of scope.

@wolverdude
Copy link
Owner

I'm still not fully clear here. Let's take your example:

self.add_schema({"enum": []})
self.add_object(["a", "b", "c"])

In my mind, this should output {"enum": [["a", "b", "c"]]}, but based on the code, I think you'd like {"enum": ["a", "b", "c"]}. Is that correct? If so, I don't think we can do that. I don't think the input object would validate under the latter schema since ["a", "b", "c"] isn't among the enum values.

@tokarenko
Copy link
Author

Oh, I see. Yes, indeed I expected the result to be {"enum": ["a", "b", "c"]}. So, there is no theoretically valid way of populating the schema the way I want?

@wolverdude
Copy link
Owner

wolverdude commented Feb 14, 2025

I don't think so. If you want to be sure, run it against a schema validator.

Even if not, you can always create a custom SchemaStrategy to behave any way you like.

@tokarenko
Copy link
Author

@wolverdude , while waiting for the merge I tried to install from local fork at GitLab. I found that schema subdirectory is not installed:

.venv/lib/python3.11/site-packages/genson/__init__.py:1: in <module>
    from .schema.builder import SchemaBuilder, Schema
E   ModuleNotFoundError: No module named 'genson.schema'

To fix this I added the following line to MANIFEST.in:
recursive-include genson *

@wolverdude
Copy link
Owner

That's probably the problem behind issue #80. Thanks for letting me know!

@tokarenko
Copy link
Author

Hi Jon (@wolverdude)! I would be grateful if you could merge this PR along with changes to MANIFEST.in. I am repackaging my projects and I would like to dismiss my own fork of genson.

@wolverdude
Copy link
Owner

Sorry, I'll try to get that done this week.

@wolverdude
Copy link
Owner

@tokarenko the code here has some errors that prevent it from running. I can fix these myself, but if you've been using this feature already, I would prefer to use battle-tested code here. Are there some commits you haven't pushed?

@tokarenko
Copy link
Author

Hi Jon ( @wolverdude )! I was on vacation. I merged code from my fork. Please review it and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants