Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions docs/user_guide/source/advanced/performance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
######################
Perfromance profiling
######################


ADIOS2 provides built-in performance profiling capabilities to help users understand the runtime behavior of their I/O operations and identify potential bottlenecks.
This documentation outlines how to interpret the performance profiling features in ADIOS2 and how to enable profiling with external libraries.

JSON Performance File
------------------------------

ADIOS2, for file-based transfers, automatically enables performance profiling by default. During the execution of an ADIOS2 application, a ``bp`` folder is created. This folder contains the data and metadata generated by the application. In addition to these files, a ``profiling.json`` file is generated within the same directory. This file holds detailed performance information about various internal operations of the ADIOS2 I/O library.

The structure of the ``profiling.json`` file is a JSON array, where each element typically corresponds to the profiling information from a single MPI rank. The following is an example of the content of a ``profiling.json`` file when an ADIOS2 application is run with two MPI ranks:

.. code-block:: json

{ "rank":0, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 1198, "ES_meta1_gather":{"mus":1198, "nCalls":100},"ES_mus": 357129, "ES":{"mus":357129, "nCalls":100},"Marshal_mus": 189057, "Marshal":{"mus":189057, "nCalls":300},"ES_meta1_mus": 1824, "ES_meta1":{"mus":1824, "nCalls":100},"ES_meta2_mus": 3190, "ES_meta2":{"mus":3190, "nCalls":100},"ES_close_mus": 1126, "ES_close":{"mus":1126, "nCalls":100},"ES_AWD_mus": 350717, "ES_AWD":{"mus":350717, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":419430400, "close":{"mus":444, "nCalls":1}, "write":{"mus":233151, "nCalls":400}, "open":{"mus":1654, "nCalls":1}}, "transport_1":{"type":"File_POSIX", "wbytes":178720, "close":{"mus":364, "nCalls":1}, "write":{"mus":1807, "nCalls":704}, "open":{"mus":831, "nCalls":1}} },
{ "rank":1, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 248, "ES_meta1_gather":{"mus":248, "nCalls":100},"ES_mus": 355382, "ES":{"mus":355382, "nCalls":100},"Marshal_mus": 190353, "Marshal":{"mus":190353, "nCalls":300},"ES_meta1_mus": 431, "ES_meta1":{"mus":431, "nCalls":100},"ES_meta2_mus": 0, "ES_meta2":{"mus":0, "nCalls":100},"ES_close_mus": 739, "ES_close":{"mus":739, "nCalls":100},"ES_AWD_mus": 353988, "ES_AWD":{"mus":353988, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0 },


Each JSON object within the array provides profiling information for a specific rank and includes details such as:

* **rank:** The MPI rank of the process.
* **start:** The timestamp when profiling began for this rank.
* **<Operation>_mus:** The total time spent in microseconds for a specific ADIOS2 operation (e.g., ``ES_mus`` for Engine Step).
* **<Operation>:** A dictionary containing the total time (``mus``) and the number of calls (``nCalls``) for that operation.
* **databytes:** The total number of data bytes processed.
* **metadatabytes:** The total number of metadata bytes processed.
* **metametadatabytes:** The total number of meta-metadata bytes processed.
* **transport_<id>:** Details about specific communication transports used, including the type and the number of bytes and calls for operations like open, close, read, and write.


**Note:** The specific ADIOS2 library code regions and operations tracked within the ``profiling.json`` file can vary between different versions of ADIOS2. The keys and the level of detail provided in the JSON output might be subject to change as the library evolves.

To aid in the visual analysis of I/O performance, ADIOS provides a utility script designed for plotting the data contained within these JSON profile files. This script, located in the ``source/utils/profiler/scripts`` directory of the source tree, offer simple command-line interfaces to generate visualizations for common output metrics for each rank for a given step.

The common metrics covered by the plotting scripts include **PP** (PerformPut), **PDW** (PerformDataWrite), and the EndStep (**ES**) components: **ES\_AWD**, **ES\_aggregate\_info**, **FixedMetaInfoGather**, and **MetaInfoBcast**. Volume metrics, representing the total bytes written to storage by the primary transport layer, are reported under ``transport\_0.wbytes``.

Examples of how to run the scripts and the resulting output files are available in the ADIOS source directory under ``source/utils/profiler/tests``. A typical execution example plotting the first step for a profile file generated by a run of 512 ranks is shown below, demonstrating how the scripts process the attributes and generate individual rank plots (via ``plotRanks.py``) and aggregated stack plots (via ``plotStack.py``):

.. code-block:: sh

$ source 1.sh ../scripts zero ../sample_data/t0/t0.json
Attributes: PP PDW ES ES_AWD ES_aggregate_info MetaInfoBcast FixedMetaInfoGather transport_0.wbytes
Processing ../sample_data/t0/t0.json, PP key= t0
... (processing details truncated) ...
outs/t0_secs_PP -> outs/zero/t0_secs_PP
Data extracted, now plotting..
... (plotting details truncated) ...
==> plot all the times spent on rank 0: python3 ../scripts/plotStack.py t0 --set dataDir=outs/zero whichRank=0 plotPrefix=plots/single/ews/zero/t0
Script name: ../scripts/plotStack.py
async counter = 0, false false false
Finished. plots are in: plots/single/ews/zero


External Profiling Libraries
----------------------------

ADIOS2 utilizes ``PERFSTUBS_SCOPED_TIMER`` hooks at various points within its codebase. These hooks provide a standardized mechanism for external performance analysis tools to instrument and measure the execution time of different ADIOS2 code regions.

One such external library that can leverage these hooks is the **Tuning and Analysis Utilities (TAU)**. TAU is a comprehensive parallel performance analysis toolkit capable of profiling and tracing parallel programs written in various languages, including C, C++, Fortran, and Python.
TAU can automatically detect and instrument the PERFSTUBS_SCOPED_TIMER regions within ADIOS2 for all backends.

**Example TAU Output:**

When TAU is used to profile an ADIOS2 application, the output might look similar to the following:

.. code-block:: text

%Time Exclusive Inclusive Ncalls #threads visits bytes Function Name
----- ----------- ----------- ----------- --------- ---------- ---------- --------------
100.0 0.174 1:04.251 1 1 1 64251713 .TAU application
100.0 1:00.333 1:04.251 1 12490 0 64251539 int taupreload_main(int, char **, char **)
2.5 1,599 1,600 101 2230 <...> 15850 BP5Writer::EndStep
1.6 1,004 1,004 12000 0 <...> 84 MPI_Sendrecv()
1.4 1 902 303 202 <...> 2977 void adios2::format::BP5Serializer::Marshal(void*, const char*, adios2::DataType, std::size_t, std::size_t, const size_t*, const size_t*, const size_t*, const void*, bool, adios2::format::BufferV::BufferPos*)
1.4 901 901 202 0 <...> 4460 void adios2::format::GetMinMax(const void*, std::size_t, adios2::DataType, adios2::MinMaxStruct&, adios2::MemorySpace)

In this example output:

* **%Time:** The percentage of the total execution time spent in the function.
* **Exclusive:** The time spent solely within the function (excluding calls to other functions).
* **Inclusive:** The total time spent within the function, including calls to other functions.
* **Ncalls:** The number of times the function was called.
* **Function Name:** The name of the ADIOS2 function or code region that was instrumented.

TAU files generated from ADIOS2 applications can then be analyzed using a variety of performance analysis tools, such as the ParaProf Profile Browser or Vampir, to visualize and understand the application's behavior.

More information about TAU can be found at `https://github.com/UO-OACISS/tau2 <https://github.com/UO-OACISS/tau2>`_.

**Note:** The specific ADIOS2 code regions surrounded by hooks can vary between different versions of ADIOS2.

Real-time Performance Monioring
-------------------------------

The TAU performance system now offers a dedicated plugin for ADIOS2, enabling the storage of performance metrics directly within ADIOS files.

When the TAU ADIOS plugin is active, performance metrics from instrumented code regions are recorded as a series of attributes and variables. These data follow a specific naming convention, providing detailed information about the measured performance events. An example of the output generated by the TAU ADIOS plugin might look like this:

.. code-block:: text

string TAU:0:0:MetaData:CPU Cores attr = "64"
string TAU:0:0:MetaData:CWD attr = "kokkos-simulation"
double BP5Writer::EndStep / Calls
double BP5Writer::EndStep / Exclusive TIME
double BP5Writer::EndStep / Inclusive TIME
double Kokkos::parallel_reduce / Calls
double Kokkos::parallel_reduce / Exclusive TIME
double Kokkos::parallel_reduce / Inclusive TIME
double MPI_Sendrecv() / Calls
double MPI_Sendrecv() / Exclusive TIME
double MPI_Sendrecv() / Inclusive TIME

Here, the variables prefixed with ``TAU:rank:thread:MetaData:`` provide contextual information about the profiling run, such as the number of CPU cores or the current working directory.
Subsequent variables capture performance metrics for specific code regions (e.g., ``BP5Writer::EndStep``, ``Kokkos::parallel_reduce``, ``MPI_Sendrecv()``), including the number of calls, exclusive execution time (time spent solely within the function), and inclusive execution time (total time spent within the function including calls to other functions).

Having TAU performance metrics stored as ADIOS files offers a couple of advantages for managing and analyzing performance data:

* **Campaign Integration:** Performance files can be seamlessly into campaigns alongside simulation output data.
* **Near Real-time Streaming:** The performance metrics can be streamed in near real time using ADIOS's streaming capabilities. This enables live performance monitoring and analysis of long-running simulations, providing immediate insights into the application's behavior as it executes.

A tutorial on how to use TAU with the ADIOS2 plugin can be found here (page 206): `https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf <https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf>`_.

1 change: 1 addition & 0 deletions docs/user_guide/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Funded by the `Exascale Computing Project (ECP) <https://www.exascaleproject.org
advanced/campaign_management
advanced/ecp_hardware
advanced/derived_variables
advanced/performance

.. toctree::
:caption: Tutorials
Expand Down
12 changes: 6 additions & 6 deletions source/utils/profiler/scripts/extract.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@ processFile() {
if [[ "$filePath" == *"async"* ]]; then
asyncKey="async"
fi

echo "Processing $filePath, $attrName key= ${asyncKey}${key}"

if [[ $attrName == *bytes* ]]; then
jq -r ".[] | .$attrName" "$filePath" | awk '{print $1/1048576}' > "${outDir}/${asyncKey}${key}_MB_${attrName}"
else
else
local attrMus="${attrName}_mus"
local attrNCalls="${attrName}.nCalls"

Expand All @@ -58,13 +58,13 @@ else
fi
echo "Attributes: ${knownAttrs[*]}"

args=("$@")
for ((i = 2; i <= $#; i++ )); do
args=("$@")
for ((i = 1; i < $#; i++ )); do
#currFile=$argv[i]
currFile="${args[$((i))]}"
for currAttr in "${knownAttrs[@]}"; do
processFile "$currFile" "$currAttr"
done
done
if [[ $currFile == *async* ]]; then
for tmp in "${asyncAttrs[@]}"; do
echo "async file: currFile = $currFile, $tmp"
Expand Down
Loading