Skip to content

Commit aa7fdb3

Browse files
committed
Documentation on perfromance with ADIOS
1 parent 4582522 commit aa7fdb3

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
#################
2+
Perfromance profiling
3+
#################
4+
5+
6+
ADIOS2 provides built-in performance profiling capabilities to help users understand the runtime behavior of their I/O operations and identify potential bottlenecks.
7+
This documentation outlines how to interpret the performance profiling features in ADIOS2 and how to enable profiling with external libraries.
8+
9+
JSON Performance File
10+
------------------------------
11+
12+
ADIOS2, for file-based transfers, automatically enables performance profiling by default. During the execution of an ADIOS2 application, a ``bp`` folder is created. This folder contains the data and metadata generated by the application. In addition to these files, a ``profiling.json`` file is generated within the same directory. This file holds detailed performance information about various internal operations of the ADIOS2 I/O library.
13+
14+
The structure of the ``profiling.json`` file is a JSON array, where each element typically corresponds to the profiling information from a single MPI rank. The following is an example of the content of a ``profiling.json`` file when an ADIOS2 application is run with two MPI ranks:
15+
16+
.. code-block:: json
17+
18+
{ "rank":0, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 1198, "ES_meta1_gather":{"mus":1198, "nCalls":100},"ES_mus": 357129, "ES":{"mus":357129, "nCalls":100},"Marshal_mus": 189057, "Marshal":{"mus":189057, "nCalls":300},"ES_meta1_mus": 1824, "ES_meta1":{"mus":1824, "nCalls":100},"ES_meta2_mus": 3190, "ES_meta2":{"mus":3190, "nCalls":100},"ES_close_mus": 1126, "ES_close":{"mus":1126, "nCalls":100},"ES_AWD_mus": 350717, "ES_AWD":{"mus":350717, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":419430400, "close":{"mus":444, "nCalls":1}, "write":{"mus":233151, "nCalls":400}, "open":{"mus":1654, "nCalls":1}}, "transport_1":{"type":"File_POSIX", "wbytes":178720, "close":{"mus":364, "nCalls":1}, "write":{"mus":1807, "nCalls":704}, "open":{"mus":831, "nCalls":1}} },
19+
{ "rank":1, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 248, "ES_meta1_gather":{"mus":248, "nCalls":100},"ES_mus": 355382, "ES":{"mus":355382, "nCalls":100},"Marshal_mus": 190353, "Marshal":{"mus":190353, "nCalls":300},"ES_meta1_mus": 431, "ES_meta1":{"mus":431, "nCalls":100},"ES_meta2_mus": 0, "ES_meta2":{"mus":0, "nCalls":100},"ES_close_mus": 739, "ES_close":{"mus":739, "nCalls":100},"ES_AWD_mus": 353988, "ES_AWD":{"mus":353988, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0 },
20+
21+
22+
Each JSON object within the array provides profiling information for a specific rank and includes details such as:
23+
24+
* **``rank``:** The MPI rank of the process.
25+
* **``start``:** The timestamp when profiling began for this rank.
26+
* **``<Operation>_mus``:** The total time spent in microseconds for a specific ADIOS2 operation (e.g., ``ES_mus`` for Engine Step).
27+
* **``<Operation>``:** A dictionary containing the total time (``mus``) and the number of calls (``nCalls``) for that operation.
28+
* **``databytes``:** The total number of data bytes processed.
29+
* **``metadatabytes``:** The total number of metadata bytes processed.
30+
* **``metametadatabytes``:** The total number of meta-metadata bytes processed.
31+
* **``transport_<id>``:** Details about specific communication transports used, including the type and the number of bytes and calls for operations like open, close, read, and write.
32+
33+
34+
**Note:** The specific ADIOS2 library code regions and operations tracked within the ``profiling.json`` file can vary between different versions of ADIOS2. The keys and the level of detail provided in the JSON output might be subject to change as the library evolves.
35+
36+
37+
38+
External Profiling Libraries
39+
---------------------------------
40+
41+
ADIOS2 utilizes ``PERFSTUBS_SCOPED_TIMER`` hooks at various points within its codebase. These hooks provide a standardized mechanism for external performance analysis tools to instrument and measure the execution time of different ADIOS2 code regions.
42+
43+
One such external library that can leverage these hooks is the **Tuning and Analysis Utilities (TAU)**. TAU is a comprehensive parallel performance analysis toolkit capable of profiling and tracing parallel programs written in various languages, including C, C++, Fortran, and Python.
44+
TAU can automatically detect and instrument the PERFSTUBS_SCOPED_TIMER regions within ADIOS2 for all backends.
45+
46+
**Example TAU Output:**
47+
48+
When TAU is used to profile an ADIOS2 application, the output might look similar to the following:
49+
50+
.. code-block:: text
51+
52+
%Time Exclusive Inclusive Ncalls #threads visits bytes Function Name
53+
----- ----------- ----------- ----------- --------- ---------- ---------- --------------
54+
100.0 0.174 1:04.251 1 1 1 64251713 .TAU application
55+
100.0 1:00.333 1:04.251 1 12490 0 64251539 int taupreload_main(int, char **, char **)
56+
2.5 1,599 1,600 101 2230 <...> 15850 BP5Writer::EndStep
57+
1.6 1,004 1,004 12000 0 <...> 84 MPI_Sendrecv()
58+
1.4 1 902 303 202 <...> 2977 void adios2::format::BP5Serializer::Marshal(void*, const char*, adios2::DataType, std::size_t, std::size_t, const size_t*, const size_t*, const size_t*, const void*, bool, adios2::format::BufferV::BufferPos*)
59+
1.4 901 901 202 0 <...> 4460 void adios2::format::GetMinMax(const void*, std::size_t, adios2::DataType, adios2::MinMaxStruct&, adios2::MemorySpace)
60+
61+
In this example output:
62+
63+
* **``%Time``:** The percentage of the total execution time spent in the function.
64+
* **``Exclusive``:** The time spent solely within the function (excluding calls to other functions).
65+
* **``Inclusive``:** The total time spent within the function, including calls to other functions.
66+
* **``Ncalls``:** The number of times the function was called.
67+
* **``Function Name``:** The name of the ADIOS2 function or code region that was instrumented.
68+
69+
TAU files generated from ADIOS2 applications can then be analyzed using a variety of performance analysis tools, such as the ParaProf Profile Browser or Vampir, to visualize and understand the application's behavior.
70+
71+
More information about TAU can be found at `https://github.com/UO-OACISS/tau2 <https://github.com/UO-OACISS/tau2>`_.
72+
73+
**Note:** The specific ADIOS2 code regions surrounded by hooks can vary between different versions of ADIOS2.
74+
75+
Real-time Performance Monioring
76+
--------------
77+
78+
The TAU performance system now offers a dedicated plugin for ADIOS2, enabling the storage of performance metrics directly within ADIOS files.
79+
80+
When the TAU ADIOS plugin is active, performance metrics from instrumented code regions are recorded as a series of attributes and variables. These data follow a specific naming convention, providing detailed information about the measured performance events. An example of the output generated by the TAU ADIOS plugin might look like this:
81+
82+
.. code-block:: text
83+
84+
string TAU:0:0:MetaData:CPU Cores attr = "64"
85+
string TAU:0:0:MetaData:CWD attr = "kokkos-simulation"
86+
double BP5Writer::EndStep / Calls
87+
double BP5Writer::EndStep / Exclusive TIME
88+
double BP5Writer::EndStep / Inclusive TIME
89+
double Kokkos::parallel_reduce / Calls
90+
double Kokkos::parallel_reduce / Exclusive TIME
91+
double Kokkos::parallel_reduce / Inclusive TIME
92+
double MPI_Sendrecv() / Calls
93+
double MPI_Sendrecv() / Exclusive TIME
94+
double MPI_Sendrecv() / Inclusive TIME
95+
96+
Here, the variables prefixed with ``TAU:rank:thread:MetaData:`` provide contextual information about the profiling run, such as the number of CPU cores or the current working directory.
97+
Subsequent variables capture performance metrics for specific code regions (e.g., ``BP5Writer::EndStep``, ``Kokkos::parallel_reduce``, ``MPI_Sendrecv()``), including the number of calls, exclusive execution time (time spent solely within the function), and inclusive execution time (total time spent within the function including calls to other functions).
98+
99+
Having TAU performance metrics stored as ADIOS files offers a couple of advantages for managing and analyzing performance data:
100+
101+
* **Campaign Integration:** Performance files can be seamlessly into campaigns alongside simulation output data.
102+
* **Near Real-time Streaming:** The performance metrics can be streamed in near real time using ADIOS's streaming capabilities. This enables live performance monitoring and analysis of long-running simulations, providing immediate insights into the application's behavior as it executes.
103+
104+
A tutorial on how to use TAU with the ADIOS2 plugin can be found here (page 206): `https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf <https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf>`_.
105+

0 commit comments

Comments
 (0)