A 2D heat-stencil simulation with serial and parallel (MPI+OpenMP) implementations for HPC systems.
- Serial and parallel heat diffusion simulation
- Configurable grid sizes, heat sources, and iterations
- Periodic boundary support
- Energy logging and performance timing
- MPI/OpenMP hybrid parallelism
- Python reference implementation for validation
- Source location logging with testing option (-t)
- Shared utilities for grid assembly and visualization
- Interactive visualization of simulation results
.
├── include/ # Header files
├── src/ # Source code (stencil_serial.c, stencil_parallel.c)
├── python_src/ # Python utilities
│ ├── testing/ # Test suite and reference implementation
│ │ ├── test_stencil.py # Comprehensive test suite
│ │ └── stencil_reference.py # Python reference implementation
│ ├── plotting/ # Performance analysis and visualization scripts
│ │ ├── generate_visualizations.py # Visualization generation script
│ │ ├── plot_scaling.py # Main plotting script
│ │ └── stencil_utils.py # Shared utilities for grid assembly & viz
├── slurm_files/ # HPC job scripts for Cineca and Orfeo
├── Makefile # Build system
└── README.md
Core Dependencies:
- GCC with OpenMP support
- OpenMPI/MPICH 4.1+
- Python 3.7+ (for testing and plotting)
Python Setup (Recommended):
# Create virtual environment
python3 -m venv .env
source .env/bin/activate
# Install required packages
pip install numpy pytest matplotlib pandas
Python Packages (for plotting and testing):
- numpy: Array operations and numerical computing
- pytest: Testing framework
- matplotlib: Plotting and visualization
- pandas: Data manipulation and CSV handling
- contourpy, cycler, fonttools, kiwisolver, packaging, pillow, pluggy, Pygments, pyparsing, python-dateutil, pytz, six, tzdata: Additional dependencies
Optional (for development):
- clangd (for LSP support)
- valgrind (for memory debugging)
# Parallel version (default)
make
# Serial version
make MODE=serial
# Clean
make clean
# Run simulation and generate visualizations
make visualize
Note: The serial version uses fixed parameters and does not support command line options.
# Serial (fixed: 100x100 grid, 4 sources, 50 iterations)
./stencil_serial
# Parallel (4 MPI tasks, default parameters: 10000x10000 grid, 1000 iterations)
mpirun -np 4 ./stencil_parallel
# Parallel with source location logging
mpirun -np 4 ./stencil_parallel -t 1
# Custom parameters
mpirun -np 4 ./stencil_parallel -x 200 -y 200 -n 100 -e 8 -p 1
The parallel version supports extensive customization:
./stencil_parallel [options]
Options:
-x <size> Grid width (default: 10000)
-y <size> Grid height (default: 10000)
-n <iter> Number of iterations (default: 1000)
-e <num> Number of heat sources (default: 4)
-E <energy> Energy per source (default: 1.0)
-p <0|1> Periodic boundaries (0=no, 1=yes)
-o <0|1> Output energy stats (0=no, 1=yes)
-t <0|1> Testing mode - save source locations
-v <level> Verbosity level
-h Show help
Serial version uses fixed parameters:
- Grid size: 100×100
- Heat sources: 4 sources at positions [(25,25), (75,25), (25,75), (75,75)]
- Iterations: 50
- Energy per source: 1.0
- Boundary conditions: Non-periodic
- Output: Energy statistics and binary dumps at each step
Parallel version has configurable parameters with defaults:
- Grid size: 10000×10000
- Heat sources: 4 sources (positions depend on grid decomposition)
- Iterations: 1000
- Energy per source: 1.0
- Boundary conditions: Non-periodic
- Output: Energy statistics (binary dumps when -o 1)
Use the provided SLURM scripts in slurm_files/cineca/
:
# Available scripts: mpi_strong_scaling, openmp_scaling, perf_test
sbatch mpi_strong_scaling # MPI strong scaling test
sbatch openmp_scaling # OpenMP scaling test
sbatch perf_test # Performance test
Key settings:
- Load appropriate GCC and OpenMPI modules
- Set
OMP_PLACES=cores
,OMP_PROC_BIND=spread
- Use
--exclusive
for full node access
Use scripts in slurm_files/orfeo/
:
sbatch mpi_strong_scaling # MPI strong scaling test with NUMA awareness
sbatch openmp_scaling # OpenMP scaling test
sbatch single_testing # Single node testing
Features NUMA-aware rankfile generation for optimal performance.
- Pin threads to cores:
export OMP_PLACES=cores
- Use thread binding:
export OMP_PROC_BIND=close
orspread
- Request exclusive nodes for consistent performance
- Monitor affinity:
export OMP_DISPLAY_AFFINITY=TRUE
# Run all tests (compares serial vs parallel with fixed parameters)
make test
# Python tests only (activate virtual environment first)
source .env/bin/activate && pytest -v python_src/testing/
# Manual testing
./stencil_serial # Serial version
mpirun -np 4 ./stencil_parallel # Parallel version (4 MPI processes)
# Testing with source location logging
mpirun -np 4 ./stencil_parallel -t 1
The test generates several types of output files:
Standard Output:
plane_XXXXX.bin
: Serial version output per iterationplane_global_XXXXX.bin
: Parallel version assembled output
Testing Mode Output:
data_logging/sources_rank*.txt
: Source locations for each MPI rankdata_logging/X_plane_XXXXX.bin
: Per-rank binary data for parallel validation
The testing framework includes:
- Source validation: Compares source locations between C and Python implementations
- Grid assembly: Reconstructs full simulation grid from distributed MPI patches
- Energy conservation: Validates that total energy is conserved across iterations
- Parallel correctness: Ensures MPI decomposition produces correct results
Use the comprehensive plotting script for performance analysis:
cd python_src/plotting/
# Plot with data files (provide your own CSV files)
python plot_scaling.py data_file.csv
# Save plots without displaying
python plot_scaling.py data_file.csv --no-show --save-dir results/
# Plot only MPI scaling
python plot_scaling.py --mpi-only data.csv
# Plot only OpenMP scaling
python plot_scaling.py --openmp-only data.csv
# Custom save directory
python plot_scaling.py data.csv --save-dir my_plots/
# Headless mode (no display, just save)
python plot_scaling.py data.csv --no-show --save-dir plots/
The script generates multiple performance analysis plots:
- Total Time Scaling - Overall execution time vs. number of tasks/threads
- Computation Time Scaling - Pure computation time (excludes communication)
- Communication Time Scaling - MPI communication overhead (linear scale)
- Energy Computation Time - Time spent computing energy statistics
- Speedup Comparison - Actual speedup vs. ideal linear speedup
- Efficiency Comparison - Parallel efficiency percentage
The plotting utilities include visualization capabilities for analyzing simulation results.
make visualize
- Heat Map Display: Color-coded energy distribution across the grid
- Source Location Markers: Visual indicators for heat source positions
- Grid Assembly: Automatically reconstructs full simulation from MPI patches
- Interactive Plots: Zoom, pan, and save capabilities
- Multiple Formats: PNG, PDF, SVG export options
This creates a time series of the heat diffusion process with source locations clearly marked.
Author: Jacopo Zacchigna - University HPC Final Project Last Updated: September 2025