Add ParallelTempering #4

mswwilson · 2021-10-28T17:50:29Z

This needs to be tuned and set up for simulations, but the basic implementation is there.

kbarros · 2021-11-01T17:45:42Z

This looks unobtrusive to me, but the only concern I have is about adding more using statements to the main Sunny.jl package if they are only applicable to certain use cases. In this case, I'm not sure all users want or need MPI.jl. We had a similar problem with GLMakie. Matt, what about a loader for the ParallelTempering module that does @eval using MPI at run time? Let's discuss.

ColeMiles · 2021-11-01T18:28:50Z

Could you add an example which uses the parallelization / multiple processes? I can't quite figure out how this is intended to be run in the parallel case.

We should also think about how to unify measurements across the single CPU "regular" Sunny code and structs like WangLandau and the Parallel Tempering which constructs ensembles of systems. It seems like we shouldn't have every struct re-implement internal code which does measurements, and we also need some way for users to be able to make custom measurements without changing the internal code of these types' methods.

This seems hard enough that we should probably have a meeting about it.

kbarros · 2021-11-01T18:46:04Z

There are several design decision to be made -- I agree we should have a meeting.

mswwilson · 2021-11-01T18:53:48Z

I'm fine with changing where / how MPI is loaded. The file "PT_afm_heiseberg.jl" should be a working example, although the code is more of a skeleton that doesn't make any measurements yet, but simply collects a histogram on each process, performs replica exchanges, and prints the number of accepted exchanges. I run it from the command line like this: "mpiexec -n <nprocs> julia --project PT_afm_heisenberg.jl".

ColeMiles · 2021-11-01T20:51:05Z

Oh I see -- I should have read the comment you provided at the top of the script 😆. Thanks.

kbarros · 2021-11-03T17:40:26Z

What are the next steps? I'd like to get these pull requests merged quickly, and Matt can refine later. I guess the main thing is to check whether the using MPI statement will be a problem for ordinary users, and if we can instead @eval using MPI at run time ?

kbarros · 2021-11-09T05:16:13Z

I think this should be good to merge now.

ColeMiles

Seems good! There are two lines that have to be changed pre-merge (related to the recent addition of Requires), and a few comments I have that could be implemented if you agree with me.

It might make sense to postpone the arbitrary-measurements to another pull request though. Then, you and other users of this can experiment with what works best for you.

src/ParallelTempering.jl

examples/PT_afm_heisenberg.jl

ColeMiles · 2021-11-09T06:02:08Z

src/ParallelTempering.jl

+    # write replica exchange rates to file for each process
+    f = open(@sprintf("P%03d_rex.dat", replica.rank), "w")
+    println(f, "REX accepts (down, up) = (", rex_accepts[1], " ", rex_accepts[2],"), total = ", sum(rex_accepts), " / ", N_rex)
+    close(f)
+
+    # write histogram to file for each process
+    f = open(@sprintf("P%03d_hist.dat", replica.rank), "w")
+    println(f, A)
+    close(f)


Would it be preferable for all the processes to communicate their measurements back to a master rank, which then writes them all out into a single file?

Also: if the analysis of the output data is going to also be done in Julia, it seems like it might be better to use Serialization to serialize out the BinnedArray to make it easy to re-load (and probably uses less storage).

Though, maybe users want to do the analysis with something else, in which case plain-text makes sense. Serialized files also don't transfer between different versions of Julia / probably different versions of packages. Maybe this is a flag to set? Not sure.

Maybe we should have a longer discussion on Zoom with Matt about various design choices? I guess it can get improved in a follow-on pull request.

The serialization sounds like a good idea. I'm open to changing the IO however makes the most sense, but had them all writing their own files at first just for my convenience. I'm open to discuss this or other ideas too

ColeMiles · 2021-11-09T06:08:03Z

src/ParallelTempering.jl

+            # add measurements here 
+            #...


This could be solved in a future pull. Ideally, a typical user wouldn't have to edit any Sunny. One idea to get arbitrary measurements in from the user:

Replica could take a Function called measure, where calling measure(system) returns a Dict of measurements mapping a string (the measurement name) to whatever form the measurement takes. Replica would then call this function right where you currently have # add measurements here, and then somehow keep track of all of the measurements internally in a big Dict. You could use Serialization to serialize it out into a big file at the end.

Users can then get whatever measurements they want by passing a custom measurement function.

Not sure what the pitfalls of this approach would be though.

Yeah, this would be nice to have some way to store arbitrary observables in each replica, like the dict you mention.

ColeMiles · 2021-11-09T06:32:19Z

src/ParallelTempering.jl

+    # even rank exch. down when rex_dir==0, up when rex_dir==1
+    nn_ranks = [
+        rank + 2*(rank%2) - 1,
+        rank + 1 - 2*(rank%2)
+    ]


This may just be me, but I think it's a bit more interpretable if nn_ranks is your left/right neighbor for all ranks, and then you start off even ranks with rex_dir = 1 and odd ranks with rex_dir=2.

Then, in run! you flip with rex_dir = 3 - rex_dir and you no longer need rex_ID in replica_exchange!. Might as well stay in the 1-indexing.

This seems like a good idea.

Thanks for the nice suggestion - this makes a lot more sense! I've changed it to this now

ColeMiles · 2021-11-09T06:33:48Z

examples/PT_afm_heisenberg.jl

+T_sched(i, N) = (10 .^(range(log10(T_min), log10(T_max), length=N)) )[i]
+
+# make replica for PT
+replica = Replica(MetropolisSampler(sys, 1.0, 1))


Out of curiosity -- what happens if the user script makes two Replica's? Maybe it should error, if that's detect-able.

It still seems to work fine if this happens. I tried it and realized that the spin configurations become de-synchronized with the local running E, M in the Metropolis sampler if run! is called for each replica. I've added a line to fix this by resetting these quantities both at the start of each run, and after replica exchanges.

…sampler

mswwilson requested review from kbarros and ColeMiles October 28, 2021 17:50

kbarros force-pushed the main branch from a5232d7 to b0ec4dd Compare November 4, 2021 01:16

kbarros force-pushed the parallel-tempering branch 2 times, most recently from b070ff3 to d83741d Compare November 9, 2021 04:56

Add ParallelTempering

a2ab49e

kbarros force-pushed the parallel-tempering branch from d83741d to a2ab49e Compare November 9, 2021 04:58

kbarros self-assigned this Nov 9, 2021

kbarros removed their request for review November 9, 2021 05:15

kbarros removed their assignment Nov 9, 2021

ColeMiles reviewed Nov 9, 2021

View reviewed changes

kbarros and others added 2 commits November 9, 2021 10:45

Resolve some issues Cole found.

7d491a0

Switched exchanges to 1-based indexing, fixed running E,M values for …

6843d2e

…sampler

kbarros merged commit 5160f86 into main Nov 13, 2021

kbarros deleted the parallel-tempering branch November 13, 2021 22:18

Add ParallelTempering #4

Add ParallelTempering #4

Uh oh!

Conversation

mswwilson commented Oct 28, 2021

Uh oh!

kbarros commented Nov 1, 2021

Uh oh!

ColeMiles commented Nov 1, 2021

Uh oh!

kbarros commented Nov 1, 2021

Uh oh!

mswwilson commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ColeMiles commented Nov 1, 2021

Uh oh!

kbarros commented Nov 3, 2021

Uh oh!

kbarros commented Nov 9, 2021

Uh oh!

ColeMiles left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mswwilson commented Nov 1, 2021 •

edited

Loading

ColeMiles left a comment •

edited

Loading