This is the hipBone repository. hipBone is a GPU port of the original proxy
application called Nekbone.
It solves a screened Poisson equation in a box using a conjugate gradient method.
There are a couple of prerequisites for building hipBone;
- A compiler that supports C++17;
- An MPI stack. Any will work;
- OpenBlas.
Installing MPI and OpenBlas can be done using whatever package manager your
operating system provides.
To build and run hipBone, there is an included run.sh script which will
build the third party OCCA, then build hipBone, and run
several problem sizes and output figures of merit.
To build hipBone manually:
$ git clone --recursive <hipBone repo>
$ cd /path/to/hipBone
$ export OPENBLAS_DIR=/path/to/openblas
$ make -j `nproc`
Here is an example CORAL-2 problem size that you can run on one GPU:
$ mpirun -np 1 ./hipBone -m HIP -nx 24 -ny 24 -nz 24 -p 14
Here is the meaning of each of the command line options
nx: the number of spectral elements in the x-direction per MPI rankny: the number of spectral elements in the y-direction per MPI ranknz: the number of spectral elements in the z-direction per MPI rankp: the order of the polynomial used to approximate the solutionm: the mode to run OCCA in,HIPis for AMD GPUs butCUDAandSerialare also supported
Running on multiple GPUs can by done by passing a larger argument to np and
specifying the number of MPI ranks in each coordinate direction:
$ mpirun -np 2 ./hipBone -m HIP -nx 24 -ny 24 -nz 24 -px 2 -py 1 -pz 1 -p 14
You must specify either:
- All of
px,py,pz, or - None of
px,py, orpz.
If all of px, py and pz are specified then the product px*py*pz must
equal the argument passed to np. If none of px, py or pz are
specified then the np must be a cube and hipBone will use an equal number
of MPI ranks in each coordinate direction.
To verify that the computation is correct, add the -v option to the command
line. Example output towards the end of the run may look like this:
CG: it 96, r norm 1.328996666475e-19, alpha = 5.291357e-01
CG: it 97, r norm 2.552900554560e-19, alpha = 1.990951e+00
CG: it 98, r norm 3.836827649728e-19, alpha = 3.269689e+00
CG: it 99, r norm 2.629545869383e-19, alpha = 1.509263e+00
CG: it 100, r norm 2.045530932453e-19, alpha = 8.445030e-01
hipBone: 3, 2744, 0.0249, 100, 9.08e-06, 3.7, 2.3, 1.10e+07; N, DOFs, elapsed, iterations, time per DOF, avg BW (GB/s), avg GFLOPs, DOFs*iterations/ranks*time
hipBone: NekBone FOM = 2.6 GFLOPs.
The printed value of r norm at the end of 100 CG iterations should be small.
As per the Nekbone CORAL-2 Benchmark summary:
Benchmark results are considered correct if the reported r norm is small, generally less than 1e-8, after 100 conjugate gradient iterations.
To clean the hipBone build objects:
$ cd /path/to/hipBone/repo
$ make realclean
Please invoke make help for more supported options.
HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark: arXiv version: Chalmers N., Mishra A., McDougall D., Warburton T., 2022. HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark.
To cite this repo directly:
@MISC{ChalmersMishraMcDougallWarburtonHipBone2022,
author = "Chalmers, N. and Mishra, A. and McDougall, D. and Warburton, T.",
title = "{HipBone}: a performance-portable GPU-accelerated C++ version of the NekBone benchmark",
year = "2022",
url = "https://github.com/paranumal/hipBone",
doi = "10.5281/zenodo.6362839",
note = "Release 1.1.0"
}