-
Notifications
You must be signed in to change notification settings - Fork 449
Re-order extraction metadata union for better parsing #865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0f33e62
to
1dd0eba
Compare
@@ -203,7 +203,7 @@ class ExtractedFieldMetadata(BaseModel): | |||
|
|||
|
|||
ExtractedFieldMetaDataDict = Dict[ | |||
str, Union[ExtractedFieldMetadata, Dict[str, Any], list[Any]] | |||
str, Union[Dict[str, Any], ExtractedFieldMetadata, list[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the fix 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, why? I'm just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I think it iterates through the union parsing the first type that matches, and since ExtractedFieldMetadata is all optional, it will match any dict. However this can't be the full explanation, otherwise the ExtractedFieldMetadata
values would be parsed to Dict
, which isn't happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose after this change, ExtractedFieldMetadata
will never be hit, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose after this change,
ExtractedFieldMetadata
will never be hit, no?
@zhaotai you make a good point. Looks like parsing from json has the behavior where ExtractedFieldMetadata
wouldn't ever parse (whereas whatever the normalization that happens in the ExtractedData constructor kept the classes)
Modified this so that ExtractedFieldMetadata
instead only parses if there are no extra fields, which seems more robust.
data = json.load(f) | ||
result = ExtractedData.from_extraction_result(ExtractRun.parse_obj(data), Capacitor) | ||
assert result.field_metadata == { | ||
"dimensions": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pydantic was converting this to "dimensions": ExtractedFieldMetadata(None, None, None, None)
@@ -203,7 +203,7 @@ class ExtractedFieldMetadata(BaseModel): | |||
|
|||
|
|||
ExtractedFieldMetaDataDict = Dict[ | |||
str, Union[ExtractedFieldMetadata, Dict[str, Any], list[Any]] | |||
str, Union[Dict[str, Any], ExtractedFieldMetadata, list[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, why? I'm just curious.
08b6315
to
d63d610
Compare
0912614
to
625cc82
Compare
pydantic was parsing nested dicts to an empty
ExtractedFieldMetadata
, because that was the first value in the union