-
Notifications
You must be signed in to change notification settings - Fork 23
Add ParallelTempering #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks unobtrusive to me, but the only concern I have is about adding more |
Could you add an example which uses the parallelization / multiple processes? I can't quite figure out how this is intended to be run in the parallel case. We should also think about how to unify measurements across the single CPU "regular" Sunny code and structs like This seems hard enough that we should probably have a meeting about it. |
There are several design decision to be made -- I agree we should have a meeting. |
I'm fine with changing where / how MPI is loaded. The file "PT_afm_heiseberg.jl" should be a working example, although the code is more of a skeleton that doesn't make any measurements yet, but simply collects a histogram on each process, performs replica exchanges, and prints the number of accepted exchanges. I run it from the command line like this: "mpiexec -n <nprocs> julia --project PT_afm_heisenberg.jl". |
Oh I see -- I should have read the comment you provided at the top of the script 😆. Thanks. |
What are the next steps? I'd like to get these pull requests merged quickly, and Matt can refine later. I guess the main thing is to check whether the |
b070ff3
to
d83741d
Compare
d83741d
to
a2ab49e
Compare
I think this should be good to merge now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good! There are two lines that have to be changed pre-merge (related to the recent addition of Requires
), and a few comments I have that could be implemented if you agree with me.
It might make sense to postpone the arbitrary-measurements to another pull request though. Then, you and other users of this can experiment with what works best for you.
src/ParallelTempering.jl
Outdated
# write replica exchange rates to file for each process | ||
f = open(@sprintf("P%03d_rex.dat", replica.rank), "w") | ||
println(f, "REX accepts (down, up) = (", rex_accepts[1], " ", rex_accepts[2],"), total = ", sum(rex_accepts), " / ", N_rex) | ||
close(f) | ||
|
||
# write histogram to file for each process | ||
f = open(@sprintf("P%03d_hist.dat", replica.rank), "w") | ||
println(f, A) | ||
close(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be preferable for all the processes to communicate their measurements back to a master rank, which then writes them all out into a single file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: if the analysis of the output data is going to also be done in Julia, it seems like it might be better to use Serialization
to serialize
out the BinnedArray
to make it easy to re-load (and probably uses less storage).
Though, maybe users want to do the analysis with something else, in which case plain-text makes sense. Serialized files also don't transfer between different versions of Julia / probably different versions of packages. Maybe this is a flag to set? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a longer discussion on Zoom with Matt about various design choices? I guess it can get improved in a follow-on pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialization sounds like a good idea. I'm open to changing the IO however makes the most sense, but had them all writing their own files at first just for my convenience. I'm open to discuss this or other ideas too
# add measurements here | ||
#... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be solved in a future pull. Ideally, a typical user wouldn't have to edit any Sunny. One idea to get arbitrary measurements in from the user:
Replica
could take a Function
called measure
, where calling measure(system)
returns a Dict
of measurements mapping a string (the measurement name) to whatever form the measurement takes. Replica
would then call this function right where you currently have # add measurements here
, and then somehow keep track of all of the measurements internally in a big Dict
. You could use Serialization
to serialize
it out into a big file at the end.
Users can then get whatever measurements they want by passing a custom measurement function.
Not sure what the pitfalls of this approach would be though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this would be nice to have some way to store arbitrary observables in each replica, like the dict you mention.
src/ParallelTempering.jl
Outdated
# even rank exch. down when rex_dir==0, up when rex_dir==1 | ||
nn_ranks = [ | ||
rank + 2*(rank%2) - 1, | ||
rank + 1 - 2*(rank%2) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may just be me, but I think it's a bit more interpretable if nn_ranks
is your left/right neighbor for all ranks, and then you start off even ranks with rex_dir = 1
and odd ranks with rex_dir=2
.
Then, in run!
you flip with rex_dir = 3 - rex_dir
and you no longer need rex_ID
in replica_exchange!
. Might as well stay in the 1-indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice suggestion - this makes a lot more sense! I've changed it to this now
T_sched(i, N) = (10 .^(range(log10(T_min), log10(T_max), length=N)) )[i] | ||
|
||
# make replica for PT | ||
replica = Replica(MetropolisSampler(sys, 1.0, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity -- what happens if the user script makes two Replica
's? Maybe it should error, if that's detect-able.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still seems to work fine if this happens. I tried it and realized that the spin configurations become de-synchronized with the local running E, M in the Metropolis sampler if run! is called for each replica. I've added a line to fix this by resetting these quantities both at the start of each run, and after replica exchanges.
This needs to be tuned and set up for simulations, but the basic implementation is there.