The code in `main` parallelises efficiently using `MPI.jl`, but currently only works on one process (multi-threaded) on `dev`. This needs fixed.