Re-order extraction metadata union for better parsing #865

adrianlyjak · 2025-08-13T17:26:27Z

pydantic was parsing nested dicts to an empty ExtractedFieldMetadata, because that was the first value in the union

…xtraction result

adrianlyjak · 2025-08-13T17:27:47Z

py/llama_cloud_services/beta/agent_data/schema.py

@@ -203,7 +203,7 @@ class ExtractedFieldMetadata(BaseModel):


 ExtractedFieldMetaDataDict = Dict[
-    str, Union[ExtractedFieldMetadata, Dict[str, Any], list[Any]]
+    str, Union[Dict[str, Any], ExtractedFieldMetadata, list[Any]]


this is the fix 🤦

interesting, why? I'm just curious.

I am not sure. I think it iterates through the union parsing the first type that matches, and since ExtractedFieldMetadata is all optional, it will match any dict. However this can't be the full explanation, otherwise the ExtractedFieldMetadata values would be parsed to Dict, which isn't happening

I suppose after this change, ExtractedFieldMetadata will never be hit, no?

I suppose after this change, ExtractedFieldMetadata will never be hit, no?

@zhaotai you make a good point. Looks like parsing from json has the behavior where ExtractedFieldMetadata wouldn't ever parse (whereas whatever the normalization that happens in the ExtractedData constructor kept the classes)

Modified this so that ExtractedFieldMetadata instead only parses if there are no extra fields, which seems more robust.

adrianlyjak · 2025-08-13T17:28:36Z

py/unit_tests/beta/agent/test_agent_data_schema.py

+        data = json.load(f)
+    result = ExtractedData.from_extraction_result(ExtractRun.parse_obj(data), Capacitor)
+    assert result.field_metadata == {
+        "dimensions": {


pydantic was converting this to "dimensions": ExtractedFieldMetadata(None, None, None, None)

zhaotai · 2025-08-13T17:40:04Z

py/llama_cloud_services/beta/agent_data/schema.py

@@ -203,7 +203,7 @@ class ExtractedFieldMetadata(BaseModel):


 ExtractedFieldMetaDataDict = Dict[
-    str, Union[ExtractedFieldMetadata, Dict[str, Any], list[Any]]
+    str, Union[Dict[str, Any], ExtractedFieldMetadata, list[Any]]


interesting, why? I'm just curious.

Re-order args so that pydantic doesn't parse nested dict to a empty e…

1dd0eba

…xtraction result

adrianlyjak force-pushed the adrian/fix-lost-dimension branch from 0f33e62 to 1dd0eba Compare August 13, 2025 17:27

adrianlyjak commented Aug 13, 2025

View reviewed changes

adrianlyjak marked this pull request as ready for review August 13, 2025 17:28

bump version

9ba6afb

zhaotai approved these changes Aug 13, 2025

View reviewed changes

Solve this a better way

d63d610

adrianlyjak force-pushed the adrian/fix-lost-dimension branch from 08b6315 to d63d610 Compare August 13, 2025 18:30

Use a citations array instead

625cc82

adrianlyjak force-pushed the adrian/fix-lost-dimension branch from 0912614 to 625cc82 Compare August 13, 2025 19:22

version bump ts to 0.3.2

40fa19d

adrianlyjak merged commit 79fe193 into main Aug 13, 2025
12 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-order extraction metadata union for better parsing #865

Re-order extraction metadata union for better parsing #865

Uh oh!

adrianlyjak commented Aug 13, 2025

Uh oh!

adrianlyjak Aug 13, 2025

Uh oh!

zhaotai Aug 13, 2025

Uh oh!

adrianlyjak Aug 13, 2025

Uh oh!

zhaotai Aug 13, 2025

Uh oh!

adrianlyjak Aug 13, 2025

Uh oh!

adrianlyjak Aug 13, 2025

Uh oh!

zhaotai Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Re-order extraction metadata union for better parsing #865

Re-order extraction metadata union for better parsing #865

Uh oh!

Conversation

adrianlyjak commented Aug 13, 2025

Uh oh!

adrianlyjak Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

zhaotai Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

adrianlyjak Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

zhaotai Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

adrianlyjak Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

adrianlyjak Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

zhaotai Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!