ornladios · anagainaru · Nov 3, 2025 · Nov 3, 2025 · Nov 3, 2025
diff --git a/docs/user_guide/source/advanced/performance.rst b/docs/user_guide/source/advanced/performance.rst
@@ -0,0 +1,124 @@
+######################
+ Perfromance profiling 
+######################
+
+
+ADIOS2 provides built-in performance profiling capabilities to help users understand the runtime behavior of their I/O operations and identify potential bottlenecks.
+This documentation outlines how to interpret the performance profiling features in ADIOS2 and how to enable profiling with external libraries.
+
+JSON Performance File
+------------------------------
+
+ADIOS2, for file-based transfers, automatically enables performance profiling by default. During the execution of an ADIOS2 application, a ``bp`` folder is created. This folder contains the data and metadata generated by the application. In addition to these files, a ``profiling.json`` file is generated within the same directory. This file holds detailed performance information about various internal operations of the ADIOS2 I/O library.
+
+The structure of the ``profiling.json`` file is a JSON array, where each element typically corresponds to the profiling information from a single MPI rank. The following is an example of the content of a ``profiling.json`` file when an ADIOS2 application is run with two MPI ranks:
+
+.. code-block:: json
+
+   { "rank":0, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 1198, "ES_meta1_gather":{"mus":1198, "nCalls":100},"ES_mus": 357129, "ES":{"mus":357129, "nCalls":100},"Marshal_mus": 189057, "Marshal":{"mus":189057, "nCalls":300},"ES_meta1_mus": 1824, "ES_meta1":{"mus":1824, "nCalls":100},"ES_meta2_mus": 3190, "ES_meta2":{"mus":3190, "nCalls":100},"ES_close_mus": 1126, "ES_close":{"mus":1126, "nCalls":100},"ES_AWD_mus": 350717, "ES_AWD":{"mus":350717, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":419430400, "close":{"mus":444, "nCalls":1}, "write":{"mus":233151, "nCalls":400}, "open":{"mus":1654, "nCalls":1}}, "transport_1":{"type":"File_POSIX", "wbytes":178720, "close":{"mus":364, "nCalls":1}, "write":{"mus":1807, "nCalls":704}, "open":{"mus":831, "nCalls":1}} },
+   { "rank":1, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 248, "ES_meta1_gather":{"mus":248, "nCalls":100},"ES_mus": 355382, "ES":{"mus":355382, "nCalls":100},"Marshal_mus": 190353, "Marshal":{"mus":190353, "nCalls":300},"ES_meta1_mus": 431, "ES_meta1":{"mus":431, "nCalls":100},"ES_meta2_mus": 0, "ES_meta2":{"mus":0, "nCalls":100},"ES_close_mus": 739, "ES_close":{"mus":739, "nCalls":100},"ES_AWD_mus": 353988, "ES_AWD":{"mus":353988, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0 },
+
+
+Each JSON object within the array provides profiling information for a specific rank and includes details such as:
+
+* **rank:** The MPI rank of the process.
+* **start:** The timestamp when profiling began for this rank.
+* **<Operation>_mus:** The total time spent in microseconds for a specific ADIOS2 operation (e.g., ``ES_mus`` for Engine Step).
+* **<Operation>:** A dictionary containing the total time (``mus``) and the number of calls (``nCalls``) for that operation.
+* **databytes:** The total number of data bytes processed.
+* **metadatabytes:** The total number of metadata bytes processed.
+* **metametadatabytes:** The total number of meta-metadata bytes processed.
+* **transport_<id>:** Details about specific communication transports used, including the type and the number of bytes and calls for operations like open, close, read, and write.
+
+
+**Note:** The specific ADIOS2 library code regions and operations tracked within the ``profiling.json`` file can vary between different versions of ADIOS2. The keys and the level of detail provided in the JSON output might be subject to change as the library evolves.
+
+To aid in the visual analysis of I/O performance, ADIOS provides a utility script designed for plotting the data contained within these JSON profile files. This script, located in the ``source/utils/profiler/scripts`` directory of the source tree, offer simple command-line interfaces to generate visualizations for common output metrics for each rank for a given step. 
+
+The common metrics covered by the plotting scripts include **PP** (PerformPut), **PDW** (PerformDataWrite), and the EndStep (**ES**) components: **ES\_AWD**, **ES\_aggregate\_info**, **FixedMetaInfoGather**, and **MetaInfoBcast**. Volume metrics, representing the total bytes written to storage by the primary transport layer, are reported under ``transport\_0.wbytes``.
+
+Examples of how to run the scripts and the resulting output files are available in the ADIOS source directory under ``source/utils/profiler/tests``. A typical execution example plotting the first step for a profile file generated by a run of 512 ranks is shown below, demonstrating how the scripts process the attributes and generate individual rank plots (via ``plotRanks.py``) and aggregated stack plots (via ``plotStack.py``):
+
+.. code-block:: sh
+
+   $ source 1.sh ../scripts zero ../sample_data/t0/t0.json
+   Attributes: PP PDW ES ES_AWD ES_aggregate_info MetaInfoBcast FixedMetaInfoGather transport_0.wbytes
+   Processing ../sample_data/t0/t0.json, PP key= t0
+   ... (processing details truncated) ...
+   outs/t0_secs_PP -> outs/zero/t0_secs_PP
+   Data extracted, now plotting..
+   ... (plotting details truncated) ...
+   ==> plot all the times spent on rank 0: python3 ../scripts/plotStack.py  t0  --set dataDir=outs/zero  whichRank=0 plotPrefix=plots/single/ews/zero/t0
+   Script name: ../scripts/plotStack.py
+   async counter = 0, false false false
+   Finished. plots are in: plots/single/ews/zero
+
+
+External Profiling Libraries
+----------------------------
+
+ADIOS2 utilizes ``PERFSTUBS_SCOPED_TIMER`` hooks at various points within its codebase. These hooks provide a standardized mechanism for external performance analysis tools to instrument and measure the execution time of different ADIOS2 code regions.
+
+One such external library that can leverage these hooks is the **Tuning and Analysis Utilities (TAU)**. TAU is a comprehensive parallel performance analysis toolkit capable of profiling and tracing parallel programs written in various languages, including C, C++, Fortran, and Python.
+TAU can automatically detect and instrument the PERFSTUBS_SCOPED_TIMER regions within ADIOS2 for all backends.
+
+**Example TAU Output:**
+
+When TAU is used to profile an ADIOS2 application, the output might look similar to the following:
+
+.. code-block:: text
+
+   %Time   Exclusive   Inclusive      Ncalls  #threads     visits      bytes  Function Name
+   ----- ----------- ----------- ----------- --------- ---------- ---------- --------------
+   100.0       0.174   1:04.251           1         1          1   64251713  .TAU application
+   100.0   1:00.333   1:04.251           1      12490          0   64251539  int taupreload_main(int, char **, char **)
+     2.5       1,599       1,600         101       2230       <...>      15850  BP5Writer::EndStep
+     1.6       1,004       1,004       12000          0       <...>         84  MPI_Sendrecv()
+     1.4         1         902         303        202       <...>       2977  void adios2::format::BP5Serializer::Marshal(void*, const char*, adios2::DataType, std::size_t, std::size_t, const size_t*, const size_t*, const size_t*, const void*, bool, adios2::format::BufferV::BufferPos*)
+     1.4       901         901         202          0       <...>       4460  void adios2::format::GetMinMax(const void*, std::size_t, adios2::DataType, adios2::MinMaxStruct&, adios2::MemorySpace)
+
+In this example output:
+
+* **%Time:** The percentage of the total execution time spent in the function.
+* **Exclusive:** The time spent solely within the function (excluding calls to other functions).
+* **Inclusive:** The total time spent within the function, including calls to other functions.
+* **Ncalls:** The number of times the function was called.
+* **Function Name:** The name of the ADIOS2 function or code region that was instrumented.
+
+TAU files generated from ADIOS2 applications can then be analyzed using a variety of performance analysis tools, such as the ParaProf Profile Browser or Vampir, to visualize and understand the application's behavior.
+
+More information about TAU can be found at `https://github.com/UO-OACISS/tau2 <https://github.com/UO-OACISS/tau2>`_.
+
+**Note:** The specific ADIOS2 code regions surrounded by hooks can vary between different versions of ADIOS2.
+
+Real-time Performance Monioring
+-------------------------------
+
+The TAU performance system now offers a dedicated plugin for ADIOS2, enabling the storage of performance metrics directly within ADIOS files.
+
+When the TAU ADIOS plugin is active, performance metrics from instrumented code regions are recorded as a series of attributes and variables. These data follow a specific naming convention, providing detailed information about the measured performance events. An example of the output generated by the TAU ADIOS plugin might look like this:
+
+.. code-block:: text
+
+   string    TAU:0:0:MetaData:CPU Cores             attr = "64"
+   string    TAU:0:0:MetaData:CWD                 attr = "kokkos-simulation"
+   double    BP5Writer::EndStep / Calls
+   double    BP5Writer::EndStep / Exclusive TIME
+   double    BP5Writer::EndStep / Inclusive TIME
+   double    Kokkos::parallel_reduce / Calls
+   double    Kokkos::parallel_reduce / Exclusive TIME
+   double    Kokkos::parallel_reduce / Inclusive TIME
+   double    MPI_Sendrecv() / Calls
+   double    MPI_Sendrecv() / Exclusive TIME
+   double    MPI_Sendrecv() / Inclusive TIME
+
+Here, the variables prefixed with ``TAU:rank:thread:MetaData:`` provide contextual information about the profiling run, such as the number of CPU cores or the current working directory.
+Subsequent variables capture performance metrics for specific code regions (e.g., ``BP5Writer::EndStep``, ``Kokkos::parallel_reduce``, ``MPI_Sendrecv()``), including the number of calls, exclusive execution time (time spent solely within the function), and inclusive execution time (total time spent within the function including calls to other functions).
+
+Having TAU performance metrics stored as ADIOS files offers a couple of advantages for managing and analyzing performance data:
+
+* **Campaign Integration:** Performance files can be seamlessly into campaigns alongside simulation output data.
+* **Near Real-time Streaming:** The performance metrics can be streamed in near real time using ADIOS's streaming capabilities. This enables live performance monitoring and analysis of long-running simulations, providing immediate insights into the application's behavior as it executes.
+
+A tutorial on how to use TAU with the ADIOS2 plugin can be found here (page 206): `https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf <https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf>`_.
+
diff --git a/docs/user_guide/source/index.rst b/docs/user_guide/source/index.rst
@@ -43,6 +43,7 @@ Funded by the `Exascale Computing Project (ECP) <https://www.exascaleproject.org
    advanced/campaign_management
    advanced/ecp_hardware
    advanced/derived_variables
+   advanced/performance
 
 .. toctree::
    :caption: Tutorials

@@ -27,12 +27,12 @@ processFile() {
     if [[ "$filePath" == *"async"* ]]; then
 	asyncKey="async"
     fi
-	
+
     echo "Processing $filePath, $attrName key= ${asyncKey}${key}"
-    
+
     if [[ $attrName == *bytes* ]]; then
 	jq -r ".[] | .$attrName"  "$filePath" | awk '{print $1/1048576}' > "${outDir}/${asyncKey}${key}_MB_${attrName}"
-    else    
+    else
 	local attrMus="${attrName}_mus"
 	local attrNCalls="${attrName}.nCalls"
 
@@ -58,13 +58,13 @@ else
     fi
     echo "Attributes: ${knownAttrs[*]}"
 
-    args=("$@")  
-    for ((i = 2; i <= $#; i++ )); do
+    args=("$@")
+    for ((i = 1; i < $#; i++ )); do
 	#currFile=$argv[i]
 	currFile="${args[$((i))]}"
 	for currAttr in "${knownAttrs[@]}"; do
 	    processFile "$currFile" "$currAttr"
-	done	
+    done
 	if [[ $currFile == *async* ]]; then
 	    for tmp in "${asyncAttrs[@]}"; do
 		echo "async file: currFile = $currFile, $tmp"