Skip to content

Add ParallelTempering #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 13, 2021
Merged

Add ParallelTempering #4

merged 3 commits into from
Nov 13, 2021

Conversation

mswwilson
Copy link
Contributor

This needs to be tuned and set up for simulations, but the basic implementation is there.

@kbarros
Copy link
Member

kbarros commented Nov 1, 2021

This looks unobtrusive to me, but the only concern I have is about adding more using statements to the main Sunny.jl package if they are only applicable to certain use cases. In this case, I'm not sure all users want or need MPI.jl. We had a similar problem with GLMakie. Matt, what about a loader for the ParallelTempering module that does @eval using MPI at run time? Let's discuss.

@ColeMiles
Copy link
Contributor

Could you add an example which uses the parallelization / multiple processes? I can't quite figure out how this is intended to be run in the parallel case.

We should also think about how to unify measurements across the single CPU "regular" Sunny code and structs like WangLandau and the Parallel Tempering which constructs ensembles of systems. It seems like we shouldn't have every struct re-implement internal code which does measurements, and we also need some way for users to be able to make custom measurements without changing the internal code of these types' methods.

This seems hard enough that we should probably have a meeting about it.

@kbarros
Copy link
Member

kbarros commented Nov 1, 2021

There are several design decision to be made -- I agree we should have a meeting.

@mswwilson
Copy link
Contributor Author

mswwilson commented Nov 1, 2021

I'm fine with changing where / how MPI is loaded. The file "PT_afm_heiseberg.jl" should be a working example, although the code is more of a skeleton that doesn't make any measurements yet, but simply collects a histogram on each process, performs replica exchanges, and prints the number of accepted exchanges. I run it from the command line like this: "mpiexec -n <nprocs> julia --project PT_afm_heisenberg.jl".

@ColeMiles
Copy link
Contributor

Oh I see -- I should have read the comment you provided at the top of the script 😆. Thanks.

@kbarros
Copy link
Member

kbarros commented Nov 3, 2021

What are the next steps? I'd like to get these pull requests merged quickly, and Matt can refine later. I guess the main thing is to check whether the using MPI statement will be a problem for ordinary users, and if we can instead @eval using MPI at run time ?

@kbarros kbarros force-pushed the parallel-tempering branch 2 times, most recently from b070ff3 to d83741d Compare November 9, 2021 04:56
@kbarros kbarros force-pushed the parallel-tempering branch from d83741d to a2ab49e Compare November 9, 2021 04:58
@kbarros kbarros self-assigned this Nov 9, 2021
@kbarros kbarros removed their request for review November 9, 2021 05:15
@kbarros kbarros removed their assignment Nov 9, 2021
@kbarros
Copy link
Member

kbarros commented Nov 9, 2021

I think this should be good to merge now.

Copy link
Contributor

@ColeMiles ColeMiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good! There are two lines that have to be changed pre-merge (related to the recent addition of Requires), and a few comments I have that could be implemented if you agree with me.

It might make sense to postpone the arbitrary-measurements to another pull request though. Then, you and other users of this can experiment with what works best for you.

Comment on lines 156 to 164
# write replica exchange rates to file for each process
f = open(@sprintf("P%03d_rex.dat", replica.rank), "w")
println(f, "REX accepts (down, up) = (", rex_accepts[1], " ", rex_accepts[2],"), total = ", sum(rex_accepts), " / ", N_rex)
close(f)

# write histogram to file for each process
f = open(@sprintf("P%03d_hist.dat", replica.rank), "w")
println(f, A)
close(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be preferable for all the processes to communicate their measurements back to a master rank, which then writes them all out into a single file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: if the analysis of the output data is going to also be done in Julia, it seems like it might be better to use Serialization to serialize out the BinnedArray to make it easy to re-load (and probably uses less storage).

Though, maybe users want to do the analysis with something else, in which case plain-text makes sense. Serialized files also don't transfer between different versions of Julia / probably different versions of packages. Maybe this is a flag to set? Not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a longer discussion on Zoom with Matt about various design choices? I guess it can get improved in a follow-on pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serialization sounds like a good idea. I'm open to changing the IO however makes the most sense, but had them all writing their own files at first just for my convenience. I'm open to discuss this or other ideas too

Comment on lines +140 to +141
# add measurements here
#...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be solved in a future pull. Ideally, a typical user wouldn't have to edit any Sunny. One idea to get arbitrary measurements in from the user:

Replica could take a Function called measure, where calling measure(system) returns a Dict of measurements mapping a string (the measurement name) to whatever form the measurement takes. Replica would then call this function right where you currently have # add measurements here, and then somehow keep track of all of the measurements internally in a big Dict. You could use Serialization to serialize it out into a big file at the end.

Users can then get whatever measurements they want by passing a custom measurement function.

Not sure what the pitfalls of this approach would be though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be nice to have some way to store arbitrary observables in each replica, like the dict you mention.

Comment on lines 31 to 35
# even rank exch. down when rex_dir==0, up when rex_dir==1
nn_ranks = [
rank + 2*(rank%2) - 1,
rank + 1 - 2*(rank%2)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may just be me, but I think it's a bit more interpretable if nn_ranks is your left/right neighbor for all ranks, and then you start off even ranks with rex_dir = 1 and odd ranks with rex_dir=2.

Then, in run! you flip with rex_dir = 3 - rex_dir and you no longer need rex_ID in replica_exchange!. Might as well stay in the 1-indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice suggestion - this makes a lot more sense! I've changed it to this now

T_sched(i, N) = (10 .^(range(log10(T_min), log10(T_max), length=N)) )[i]

# make replica for PT
replica = Replica(MetropolisSampler(sys, 1.0, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity -- what happens if the user script makes two Replica's? Maybe it should error, if that's detect-able.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems to work fine if this happens. I tried it and realized that the spin configurations become de-synchronized with the local running E, M in the Metropolis sampler if run! is called for each replica. I've added a line to fix this by resetting these quantities both at the start of each run, and after replica exchanges.

@kbarros kbarros merged commit 5160f86 into main Nov 13, 2021
@kbarros kbarros deleted the parallel-tempering branch November 13, 2021 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants