-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Hi! I've been doing some experiments with some rather large circuits, trying to see how far we can push contraction-path optimisation. We are using the sampler_sample API, essentially reproducing this example. We are keeping track of the memory required by each contraction path by setting the environment value CUTENSORNET_LOG_LEVEL=6 and having a look at the logs (particularly, the lines with worksizeNeeded).
At first, we tried setting no value to CONFIG_NUM_HYPER_SAMPLES and we saw that worksizeNeeded monotonically decreases until the optimisation decides to stop. We wanted to provide more time for the optimiser to try and find better contraction paths, so we set CONFIG_NUM_HYPER_SAMPLES=100, but then the worksizeNeeded reported no longer decreased monotonically, but fluctuated across the 100 samples. In the end, the CONFIG_NUM_HYPER_SAMPLES=100 run took way longer, but it did find a worksizeNeeded somewhat lower than the default (a bit smaller than a half).
I'm attaching the two logs, showing only lineas with "worksizeNeeded" via grep "worksizeNeeded" log.txt. The _100 log corresponds to that number of samples, "_0" is for the default one. We're talking about petabytes of worksize needed here -- as I said, we are limit testing.
worksizeNeeded_0.log
worksizeNeeded_100.log
I would like to know a couple of things:
- What is the optimiser doing when
CONFIG_NUM_HYPER_SAMPLESis left to its default value.- In particular, how do you decide to stop?
- Is the monotonic decrease shown in the logs just because you do not report samples that increase the worksizeNeeded, or is it using an optimisation algorithm that guarantees no sample with larger worksizeNeeded is explored?
- Can I extend the time I leave the optimising runner for, while still using the same policy as when leaving
CONFIG_NUM_HYPER_SAMPLESto default (assuming it's actually different)? - What is the deal with the
worksizeNeeded=0lines in the log? Are these samples that somehow failed and I should read that 0 as NaN?
Cheers!
EDIT: I forgot to mention, we were using cuQuantum 24.03 here.