Fix PSQL OperationalError on archive creation #6993

GeigerJ2 · 2025-08-28T08:46:47Z

Problem

Archive creation started failing for large datasets with OperationalError due to the enforcement of a parameter limit (65535 parameters per query) via a downstream upgrade of psycopg (the database driver used in sqlalchemy for psql) that surfaced when going from aiida v2.6.4 to v2.7.0 (further discussion in issue #6545).

Solution

This is fixed in this PR by UUIDs passed to QueryBuilder query filters being batched using the batch_iter function¹. This is the same approach that was taken in #6907 where a similar OperationalError occurred for large archive imports with the sqlite backend. The batch size is exposed via the filter_size argument (also made available to the verdi archive create CLI endpoint). The default value of this argument was set to 999, which seems a bit low to me, but follows the approach for archive imports (#6907).

The helper functions of create_archive to which batching is applied are, among others, _collect_required_entities, _stream_repo_files, _check_unsealed_nodes, and _check_node_licenses. In addition, batching also had to be applied to some functions for graph traversal operations in src/aiida/tools/graph/age_rules.py, and
src/aiida/tools/graph/graph_traversers.py. I'm not sure if these incur any further (performance) implications. Maybe archive creation can be benchmarked with a large real-world archive (@mbercx?), using v2.6.4 vs this PR?

An integration test for the import of an archive with 100k Int nodes, and the export of these 100k nodes is added. On-the-fly creation of 100k Int nodes takes a considerable amount of time², so I added an archive file with 100k nodes to the new tests/data folder (which will also be used in the related PR #6991). Import / export of such a large archive still takes some time, so I had to increase the pytest timeout for that test... happy to learn about any better approaches (create the DB table in the storage backend of the test profile using raw SQL 👀).

Finally, I don't think this PR is an ideal fix as there is no guarantee batching avoids reaching the parameter limit, if queries are combined or nested. Instead, it's rather a band aid on a larger underlying problem: the construction of excessively large SQL statements with many IN clauses. I also tried to solve it at this level, in the internal QB implementation in PR #6998 (see discussion on different approaches there). However, due to the way the SQL expressions are constructed in the QB implementation, this would require a larger refactor. Given the fundamentality of the QB and the implications of changes to this part of the code, I think for now, it is (hopefully) acceptable to apply the fix in this PR at the level of archive creation, and get it out with a v2.7.2 patch release, as I think this feature being broken already for medium-sized archives for our performance backend is crucial.

To avoid large litst in these kinds of queries: filters={'id': {'in': <large-list>}}. ↩

Maybe I'm doing something wrong here?

_ = [orm.Int(i).store() for i in range(100_000)]

↩

codecov · 2025-08-28T08:48:16Z

Codecov Report

❌ Patch coverage is 1.06383% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 24.88%. Comparing base (669f249) to head (10c17aa).

Files with missing lines	Patch %	Lines
src/aiida/tools/archive/create.py	0.00%	66 Missing ⚠️
src/aiida/tools/graph/age_rules.py	0.00%	13 Missing ⚠️
src/aiida/tools/graph/graph_traversers.py	0.00%	7 Missing ⚠️
src/aiida/tools/archive/common.py	0.00%	6 Missing ⚠️
src/aiida/tools/archive/imports.py	0.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (669f249) and HEAD (10c17aa). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (669f249) HEAD (10c17aa)

2 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #6993       +/-   ##
===========================================
- Coverage   79.03%   24.88%   -54.14%     
===========================================
  Files         566      566               
  Lines       43675    43718       +43     
===========================================
- Hits        34514    10875    -23639     
- Misses       9161    32843    +23682

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

GeigerJ2 · 2025-09-03T08:50:15Z

src/aiida/tools/archive/create.py

+    filter_size: int = 10_000,
    batch_size: int = 1000,


One could also use the QueryParams dataclass here that combines filter_size and batch_size, which is used for archive imports:

aiida-core/src/aiida/tools/archive/imports.py

Lines 51 to 58 in 428a94e

@dataclass

class QueryParams:

"""Parameters for executing backend queries."""

batch_size: int

"""Batch size for streaming database rows."""

filter_size: int

"""Maximum number of parameters allowed in a single query filter."""

However, I don't see how that really adds any value here, as it just makes argument passing slightly more convoluted:

aiida-core/src/aiida/tools/archive/imports.py

Line 294 in 428a94e

Still, I now moved the definition to ./src/aiida/repository/common.py and construct the class for now in the top-level create_archive method but leave the private helper functions, _collect_[all|required]_entities unchanged, so that they still accept batch_size and the additional filter_size:

def _collect_required_entities( querybuilder: QbType, entity_ids: dict[EntityTypes, set[int]], traversal_rules: dict[str, bool], include_authinfos: bool, include_comments: bool, include_logs: bool, backend: StorageBackend, batch_size: int, filter_size: int, ) -> tuple[list[tuple[int, int]], set[LinkQuadruple]]:

import_archive and create_archive function signatures:

def import_archive( path: Union[str, Path], *, archive_format: Optional[ArchiveFormatAbstract] = None, filter_size: int = 999, batch_size: int = 1000, import_new_extras: bool = True, merge_extras: MergeExtrasType = ('k', 'n', 'l'), merge_comments: MergeCommentsType = 'leave', include_authinfos: bool = False, create_group: bool = True, group: Optional[orm.Group] = None, test_run: bool = False, backend: Optional[StorageBackend] = None, ) -> Optional[int]:

def create_archive( entities: Optional[Iterable[Union[orm.Computer, orm.Node, orm.Group, orm.User]]], filename: Union[None, str, Path] = None, *, archive_format: Optional[ArchiveFormatAbstract] = None, overwrite: bool = False, include_comments: bool = True, include_logs: bool = True, include_authinfos: bool = False, allowed_licenses: Optional[Union[list, Callable]] = None, forbidden_licenses: Optional[Union[list, Callable]] = None, strip_checkpoints: bool = True, filter_size: int = 10_000, batch_size: int = 1000, compression: int = 6, test_run: bool = False, backend: Optional[StorageBackend] = None, **traversal_rules: bool, ) -> Path:

GeigerJ2 · 2025-09-03T09:09:00Z

src/aiida/cmdline/commands/cmd_archive.py

+@click.option(
+    '-f',
+    '--filter-size',
+    default=999,


I'd have expected that one colud set this higher (I went for 10k in my first implementation), but now am using the same default as for archive imports (the original PR #6889 did use this argument to avoid parameter limits for the sqlite backend).

GeigerJ2 · 2025-09-03T09:10:18Z

src/aiida/tools/archive/common.py

@@ -27,6 +28,16 @@
    EntityTypes.COMMENT: Comment,
 }

+@dataclass
+class QueryParams:


Moved here for usage also in ./src/aiida/repository/create.py. Can be reverted, but I don't think it's crucial.

GeigerJ2 · 2025-09-03T09:13:09Z

src/aiida/tools/archive/create.py


    # extract ids/uuid from initial entities
    type_check(entities, Iterable, allow_none=True)
    if entities is None:
        group_nodes, link_data = _collect_all_entities(
-            querybuilder, entity_ids, include_authinfos, include_comments, include_logs, batch_size
+            querybuilder, entity_ids, include_authinfos, include_comments, include_logs, query_params.batch_size


No filters used in QB calls, hence only requires batch_size.

GeigerJ2 · 2025-09-03T09:14:20Z

src/aiida/tools/archive/create.py

+            query_params.batch_size,
+            query_params.filter_size,


Directly accessing elements of query_params here, as I don't see any benefit of using query_params.[batch_size|filter_size] or passing the whole query_params inside the helper functions over just using the arguments directly.

GeigerJ2 · 2025-09-03T10:09:28Z

src/aiida/tools/graph/age_rules.py

+                )
+                qres = self._querybuilder.dict()
+            else:
+                # PRCOMMENT: Maybe move `batch_iter` elsewhere? Import feels weird here?


GeigerJ2 · 2025-09-03T10:18:11Z

src/aiida/tools/graph/graph_traversers.py

+    # Batch the query to avoid parameter limits
+    existing_pks = set()
+    operational_list = list(operational_set)
+    batch_size = 10000  # Stay well under 65535 parameter limit


Relevant call stack here:

File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/archive/create.py", line 247, in create_archive group_nodes, link_data = _collect_required_entities( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/archive/create.py", line 544, in _collect_required_entities traverse_output = get_nodes_export( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py", line 105, in get_nodes_export traverse_output = traverse_graph( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py", line 245, in traverse_graph traceback.print_stack() > /home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py(248)traverse_graph() -> existing_pks = set()

GeigerJ2 · 2025-09-03T10:21:48Z

src/aiida/tools/graph/graph_traversers.py

-    query_nodes.append(orm.Node, project=['id'], filters={'id': {'in': operational_set}})
-    existing_pks = set(query_nodes.all(flat=True))
+    existing_pks = set()
+    filter_size = 10_000  # Stay well under 65535 parameter limit


Relevant call stack here:

File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/archive/create.py", line 247, in create_archive group_nodes, link_data = _collect_required_entities( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/archive/create.py", line 544, in _collect_required_entities traverse_output = get_nodes_export( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py", line 105, in get_nodes_export traverse_output = traverse_graph( File "/home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py", line 245, in traverse_graph traceback.print_stack() > /home/geiger_j/aiida_projects/aiida-dev/git-repos/aiida-core/src/aiida/tools/graph/graph_traversers.py(248)traverse_graph() -> existing_pks = set()

Again, probably OK to have it hard-coded.

GeigerJ2 · 2025-09-03T10:38:08Z

tests/data/100k-int-nodes-2.7.1.post0.aiida

Creating and storing 50k / 100k nodes on-the-fly takes a considerable amount of time (I guess because for every node storage, a db connection is opened, the data commited, and the connection closed again). Hence, I added the tests/data directory, and two pre-created archives for testing large data import / export, as we were having operational errors with both backends (PR that fixed it for sqlite: #6889, and now with psql, so it's something we should regularly test).

Anybody has a better approach, or is that fine?

tests/tools/archive/test_complex.py

GeigerJ2 · 2025-09-03T14:51:34Z

tests/tools/archive/test_complex.py

+
+@pytest.mark.usefixtures('aiida_profile_clean')
+@pytest.mark.timeout(600)  # 10 minutes for the full test
+def test_large_archive_import_export(tmp_path):


Make nightly test?!

Oh yeah, we definitely don't want that during normal test suite :D

EDIT: How long does it run normally? How long does it run on current main?

GeigerJ2 · 2025-09-04T12:51:48Z

Note to self:

❯ pip install aiida-core==2.6.4 -q
❯ pip freeze | grep -e aiida-core -ie sqlalchemy -e psycopg
aiida-core==2.6.4
psycopg2-binary==2.9.10
sphinx_sqlalchemy==0.2.0
SQLAlchemy==2.0.43
❯ pip uninstall SQLAlchemy aiida-core psycopg2-binary psycopg-binary psycopg -y
...
❯ pip install aiida-core==2.7.1 -q
❯ pip freeze | grep -e aiida-core -ie sqlalchemy -e psycopg
aiida-core==2.7.1
psycopg==3.2.9
psycopg-binary==3.2.9
sphinx_sqlalchemy==0.2.0
SQLAlchemy==2.0.43

This reverts commit 46a7b83.

GeigerJ2 force-pushed the sqla-operr-archive-create branch 3 times, most recently from 0fabfb0 to c1cdb81 Compare August 28, 2025 12:48

This was referenced Aug 28, 2025

🐛 OperationalError during archive creation #6545

Open

Fix OperationalError for add_nodes with PSQL backend #6991

Open

GeigerJ2 commented Sep 3, 2025

View reviewed changes

GeigerJ2 requested review from mbercx, agoscinski, unkcpz and khsrali September 3, 2025 10:41

GeigerJ2 added this to aiida-core v2.7.1 Sep 3, 2025

GeigerJ2 marked this pull request as ready for review September 3, 2025 10:46

GeigerJ2 added this to aiida-core v2.7.2 Sep 3, 2025

GeigerJ2 removed this from aiida-core v2.7.1 Sep 3, 2025

GeigerJ2 commented Sep 3, 2025

View reviewed changes

GeigerJ2 added 13 commits September 11, 2025 09:23

Batch required functions for archive export

baa8b10

reset changes to age_rules.py and graph_traversers.py

1e0bf5b

copy over test files from 6991

d3633d4

add filter_size to _collect_required_entities and helper functions

13f4db7

large archive creation test

aa2bb6d

Revert "reset changes to age_rules.py and graph_traversers.py"

87b9cfa

This reverts commit 46a7b83.

batch test_run report

033ae4d

wip

a10f81d

use batch_iter consistently

0cd690b

node_id_chunk -> batch_ids

52947be

Set filter_size default to 999 and use QueryParams for archive creation

07ed0da

wip

ac8f5ed

readable batch_ids variable names

903b807

GeigerJ2 added 12 commits September 11, 2025 09:23

wip

c63af79

wip

49e53c5

wip

981e77e

wip

006cc7a

wip

6317e9a

move archives for testing to shared location in tests/

5e0435a

remove test_groups.py for psql_dos backend

5215158

remove test_archive.py for psql_dos backend

8ead8fa

final

cb124c7

fix pre-commit

d047290

split large import/export fixtures and increase timeout

453701d

make it one import-export test again

10c17aa

GeigerJ2 force-pushed the sqla-operr-archive-create branch from 9f07976 to 10c17aa Compare September 11, 2025 07:23

GeigerJ2 mentioned this pull request Sep 11, 2025

Fix PSQL OperationalError on archive creation at QB level #6998

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix PSQL OperationalError on archive creation #6993

Fix PSQL OperationalError on archive creation #6993

Uh oh!

GeigerJ2 commented Aug 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

Uh oh!

GeigerJ2 Sep 3, 2025

Uh oh!

danielhollas Sep 3, 2025 •

edited

Loading

Uh oh!

GeigerJ2 commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

	@dataclass
	class QueryParams:
	"""Parameters for executing backend queries."""

	batch_size: int
	"""Batch size for streaming database rows."""
	filter_size: int
	"""Maximum number of parameters allowed in a single query filter."""

Fix PSQL OperationalError on archive creation #6993

Are you sure you want to change the base?

Fix PSQL OperationalError on archive creation #6993

Uh oh!

Conversation

GeigerJ2 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Footnotes

Uh oh!

codecov bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhollas Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GeigerJ2 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

GeigerJ2 commented Aug 28, 2025 •

edited

Loading

codecov bot commented Aug 28, 2025 •

edited

Loading

danielhollas Sep 3, 2025 •

edited

Loading

GeigerJ2 commented Sep 4, 2025 •

edited

Loading