Optimizing Acsint #26

lukas-weber · 2025-08-28T16:25:32Z

I had some time to optimize Acsint. It is much faster now, but quite a bit slower than libcint still. Maybe it is a starting point for someone who wants to use nonstandard types.

The biggest issue was excessive compilation time due to needless specialization on the shell sizes. I fixed it by making the shells have runtime shell sizes. This should also benefit type stability throughout other parts of GaussianBasis.

Simple benchmark

using GaussianBasis

bset = BasisSet("cc-pvtz", "N 0 0 1\nN 0 0 -1", spherical=false, lib=:acsint)
@time ERI_2e4c(bset);
@time ERI_2e4c(bset);

bset2 = BasisSet("cc-pvtz", "N 0 0 1\nN 0 0 -1")
@time ERI_2e4c(bset2);
@time ERI_2e4c(bset2);

Before

135.987606 seconds (66.29 M allocations: 3.037 GiB, 0.38% gc time, 95.69% compilation time)
  5.489963 seconds (5.90 M allocations: 493.756 MiB, 1.76% gc time)

This PR

  2.742417 seconds (8.55 M allocations: 1.060 GiB, 4.38% gc time, 55.30% compilation time)
  1.324503 seconds (366.76 k allocations: 681.408 MiB, 6.15% gc time)

Libcint backend

  0.517454 seconds (964.76 k allocations: 146.260 MiB, 0.74% gc time, 63.17% compilation time)
  0.187572 seconds (6.70 k allocations: 99.609 MiB, 1.67% gc time)

Compatibility

The results of ERI_2e4c(bset) are the same as before up to a relative tolerance of 1e-15. This can probably be further reduced using the new cutoff parameter of generate_ERI_quartet.

One thing I found though is that ERI_2e4c(bset) does not match ERI_2e4c(bset2), neither now nor before the PR. Possibly that has to do something with spherical vs cartesian.

Previously CartesianShell and SphericalShell used statically sized SVectors. This leads to an uncanny mixture of type instability – because in the basis they are stored in an inconcrete Vector{CartesianShell} – and massive compilation costs whenever something decides to specialize on the shell size. The latter is the case with Acsint. It basically needs to recompile entirely for every invocation of `generate_ERI_quartet!` because of the many combinations of shell sizes. This commit replaces the SVectors by Vectors, making Acsint and probably also random other parts of GaussianBasis faster to compile and more type stable.

These are various optimizations for the Acsint backend, improving cache locality and skipping redundand calculations.

gustavojra · 2025-09-02T19:18:47Z

Thank you very much @lukas-weber . I am in the middle of changing jobs, so very busy. But this weekend I will take some time to review this PR! At a first glance it looks like great work, I just want to understand it better!

lukas-weber added 3 commits August 28, 2025 12:12

optimizations for Acsint

daae10d

These are various optimizations for the Acsint backend, improving cache locality and skipping redundand calculations.

increase ERI accuracy

9e683ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizing Acsint #26

Optimizing Acsint #26

Uh oh!

lukas-weber commented Aug 28, 2025 •

edited

Loading

Uh oh!

gustavojra commented Sep 2, 2025

Uh oh!

Uh oh!

Optimizing Acsint #26

Are you sure you want to change the base?

Optimizing Acsint #26

Uh oh!

Conversation

lukas-weber commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Simple benchmark

Before

This PR

Libcint backend

Compatibility

Uh oh!

gustavojra commented Sep 2, 2025

Uh oh!

Uh oh!

lukas-weber commented Aug 28, 2025 •

edited

Loading