Skip to content

Conversation

lukas-weber
Copy link
Contributor

@lukas-weber lukas-weber commented Aug 28, 2025

I had some time to optimize Acsint. It is much faster now, but quite a bit slower than libcint still. Maybe it is a starting point for someone who wants to use nonstandard types.

The biggest issue was excessive compilation time due to needless specialization on the shell sizes. I fixed it by making the shells have runtime shell sizes. This should also benefit type stability throughout other parts of GaussianBasis.

Simple benchmark

using GaussianBasis

bset = BasisSet("cc-pvtz", "N 0 0 1\nN 0 0 -1", spherical=false, lib=:acsint)
@time ERI_2e4c(bset);
@time ERI_2e4c(bset);

bset2 = BasisSet("cc-pvtz", "N 0 0 1\nN 0 0 -1")
@time ERI_2e4c(bset2);
@time ERI_2e4c(bset2);

Before

135.987606 seconds (66.29 M allocations: 3.037 GiB, 0.38% gc time, 95.69% compilation time)
  5.489963 seconds (5.90 M allocations: 493.756 MiB, 1.76% gc time)

This PR

  2.742417 seconds (8.55 M allocations: 1.060 GiB, 4.38% gc time, 55.30% compilation time)
  1.324503 seconds (366.76 k allocations: 681.408 MiB, 6.15% gc time)

Libcint backend

  0.517454 seconds (964.76 k allocations: 146.260 MiB, 0.74% gc time, 63.17% compilation time)
  0.187572 seconds (6.70 k allocations: 99.609 MiB, 1.67% gc time)

Compatibility

The results of ERI_2e4c(bset) are the same as before up to a relative tolerance of 1e-15. This can probably be further reduced using the new cutoff parameter of generate_ERI_quartet.

One thing I found though is that ERI_2e4c(bset) does not match ERI_2e4c(bset2), neither now nor before the PR. Possibly that has to do something with spherical vs cartesian.

Previously CartesianShell and SphericalShell used statically
sized SVectors. This leads to an uncanny mixture of type instability
– because in the basis they are stored in an inconcrete Vector{CartesianShell} –
and massive compilation costs whenever something decides to specialize on the shell size.

The latter is the case with Acsint. It basically needs to recompile entirely for
every invocation of `generate_ERI_quartet!` because of the many combinations of shell sizes.

This commit replaces the SVectors by Vectors, making Acsint and probably also
random other parts of GaussianBasis faster to compile and more type stable.
These are various optimizations for the Acsint backend, improving
cache locality and skipping redundand calculations.
@gustavojra
Copy link
Member

Thank you very much @lukas-weber . I am in the middle of changing jobs, so very busy. But this weekend I will take some time to review this PR! At a first glance it looks like great work, I just want to understand it better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants