|
| 1 | +################# |
| 2 | + Perfromance profiling |
| 3 | +################# |
| 4 | + |
| 5 | + |
| 6 | +ADIOS2 provides built-in performance profiling capabilities to help users understand the runtime behavior of their I/O operations and identify potential bottlenecks. |
| 7 | +This documentation outlines how to interpret the performance profiling features in ADIOS2 and how to enable profiling with external libraries. |
| 8 | + |
| 9 | +JSON Performance File |
| 10 | +------------------------------ |
| 11 | + |
| 12 | +ADIOS2, for file-based transfers, automatically enables performance profiling by default. During the execution of an ADIOS2 application, a ``bp`` folder is created. This folder contains the data and metadata generated by the application. In addition to these files, a ``profiling.json`` file is generated within the same directory. This file holds detailed performance information about various internal operations of the ADIOS2 I/O library. |
| 13 | + |
| 14 | +The structure of the ``profiling.json`` file is a JSON array, where each element typically corresponds to the profiling information from a single MPI rank. The following is an example of the content of a ``profiling.json`` file when an ADIOS2 application is run with two MPI ranks: |
| 15 | + |
| 16 | +.. code-block:: json |
| 17 | +
|
| 18 | +{ "rank":0, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 1198, "ES_meta1_gather":{"mus":1198, "nCalls":100},"ES_mus": 357129, "ES":{"mus":357129, "nCalls":100},"Marshal_mus": 189057, "Marshal":{"mus":189057, "nCalls":300},"ES_meta1_mus": 1824, "ES_meta1":{"mus":1824, "nCalls":100},"ES_meta2_mus": 3190, "ES_meta2":{"mus":3190, "nCalls":100},"ES_close_mus": 1126, "ES_close":{"mus":1126, "nCalls":100},"ES_AWD_mus": 350717, "ES_AWD":{"mus":350717, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":419430400, "close":{"mus":444, "nCalls":1}, "write":{"mus":233151, "nCalls":400}, "open":{"mus":1654, "nCalls":1}}, "transport_1":{"type":"File_POSIX", "wbytes":178720, "close":{"mus":364, "nCalls":1}, "write":{"mus":1807, "nCalls":704}, "open":{"mus":831, "nCalls":1}} }, |
| 19 | +{ "rank":1, "start":"Wed_Dec_06_10:53:10_2023","ES_meta1_gather_mus": 248, "ES_meta1_gather":{"mus":248, "nCalls":100},"ES_mus": 355382, "ES":{"mus":355382, "nCalls":100},"Marshal_mus": 190353, "Marshal":{"mus":190353, "nCalls":300},"ES_meta1_mus": 431, "ES_meta1":{"mus":431, "nCalls":100},"ES_meta2_mus": 0, "ES_meta2":{"mus":0, "nCalls":100},"ES_close_mus": 739, "ES_close":{"mus":739, "nCalls":100},"ES_AWD_mus": 353988, "ES_AWD":{"mus":353988, "nCalls":100}, "databytes":0, "metadatabytes":0, "metametadatabytes":0 }, |
| 20 | + |
| 21 | + |
| 22 | +Each JSON object within the array provides profiling information for a specific rank and includes details such as: |
| 23 | + |
| 24 | +* **``rank``:** The MPI rank of the process. |
| 25 | +* **``start``:** The timestamp when profiling began for this rank. |
| 26 | +* **``<Operation>_mus``:** The total time spent in microseconds for a specific ADIOS2 operation (e.g., ``ES_mus`` for Engine Step). |
| 27 | +* **``<Operation>``:** A dictionary containing the total time (``mus``) and the number of calls (``nCalls``) for that operation. |
| 28 | +* **``databytes``:** The total number of data bytes processed. |
| 29 | +* **``metadatabytes``:** The total number of metadata bytes processed. |
| 30 | +* **``metametadatabytes``:** The total number of meta-metadata bytes processed. |
| 31 | +* **``transport_<id>``:** Details about specific communication transports used, including the type and the number of bytes and calls for operations like open, close, read, and write. |
| 32 | + |
| 33 | + |
| 34 | +**Note:** The specific ADIOS2 library code regions and operations tracked within the ``profiling.json`` file can vary between different versions of ADIOS2. The keys and the level of detail provided in the JSON output might be subject to change as the library evolves. |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +External Profiling Libraries |
| 39 | +--------------------------------- |
| 40 | + |
| 41 | +ADIOS2 utilizes ``PERFSTUBS_SCOPED_TIMER`` hooks at various points within its codebase. These hooks provide a standardized mechanism for external performance analysis tools to instrument and measure the execution time of different ADIOS2 code regions. |
| 42 | + |
| 43 | +One such external library that can leverage these hooks is the **Tuning and Analysis Utilities (TAU)**. TAU is a comprehensive parallel performance analysis toolkit capable of profiling and tracing parallel programs written in various languages, including C, C++, Fortran, and Python. |
| 44 | +TAU can automatically detect and instrument the PERFSTUBS_SCOPED_TIMER regions within ADIOS2 for all backends. |
| 45 | + |
| 46 | +**Example TAU Output:** |
| 47 | + |
| 48 | +When TAU is used to profile an ADIOS2 application, the output might look similar to the following: |
| 49 | + |
| 50 | +.. code-block:: text |
| 51 | +
|
| 52 | + %Time Exclusive Inclusive Ncalls #threads visits bytes Function Name |
| 53 | + ----- ----------- ----------- ----------- --------- ---------- ---------- -------------- |
| 54 | + 100.0 0.174 1:04.251 1 1 1 64251713 .TAU application |
| 55 | + 100.0 1:00.333 1:04.251 1 12490 0 64251539 int taupreload_main(int, char **, char **) |
| 56 | + 2.5 1,599 1,600 101 2230 <...> 15850 BP5Writer::EndStep |
| 57 | + 1.6 1,004 1,004 12000 0 <...> 84 MPI_Sendrecv() |
| 58 | + 1.4 1 902 303 202 <...> 2977 void adios2::format::BP5Serializer::Marshal(void*, const char*, adios2::DataType, std::size_t, std::size_t, const size_t*, const size_t*, const size_t*, const void*, bool, adios2::format::BufferV::BufferPos*) |
| 59 | + 1.4 901 901 202 0 <...> 4460 void adios2::format::GetMinMax(const void*, std::size_t, adios2::DataType, adios2::MinMaxStruct&, adios2::MemorySpace) |
| 60 | +
|
| 61 | +In this example output: |
| 62 | + |
| 63 | +* **``%Time``:** The percentage of the total execution time spent in the function. |
| 64 | +* **``Exclusive``:** The time spent solely within the function (excluding calls to other functions). |
| 65 | +* **``Inclusive``:** The total time spent within the function, including calls to other functions. |
| 66 | +* **``Ncalls``:** The number of times the function was called. |
| 67 | +* **``Function Name``:** The name of the ADIOS2 function or code region that was instrumented. |
| 68 | + |
| 69 | +TAU files generated from ADIOS2 applications can then be analyzed using a variety of performance analysis tools, such as the ParaProf Profile Browser or Vampir, to visualize and understand the application's behavior. |
| 70 | + |
| 71 | +More information about TAU can be found at `https://github.com/UO-OACISS/tau2 <https://github.com/UO-OACISS/tau2>`_. |
| 72 | + |
| 73 | +**Note:** The specific ADIOS2 code regions surrounded by hooks can vary between different versions of ADIOS2. |
| 74 | + |
| 75 | +Real-time Performance Monioring |
| 76 | +-------------- |
| 77 | + |
| 78 | +The TAU performance system now offers a dedicated plugin for ADIOS2, enabling the storage of performance metrics directly within ADIOS files. |
| 79 | + |
| 80 | +When the TAU ADIOS plugin is active, performance metrics from instrumented code regions are recorded as a series of attributes and variables. These data follow a specific naming convention, providing detailed information about the measured performance events. An example of the output generated by the TAU ADIOS plugin might look like this: |
| 81 | + |
| 82 | +.. code-block:: text |
| 83 | +
|
| 84 | + string TAU:0:0:MetaData:CPU Cores attr = "64" |
| 85 | + string TAU:0:0:MetaData:CWD attr = "kokkos-simulation" |
| 86 | + double BP5Writer::EndStep / Calls |
| 87 | + double BP5Writer::EndStep / Exclusive TIME |
| 88 | + double BP5Writer::EndStep / Inclusive TIME |
| 89 | + double Kokkos::parallel_reduce / Calls |
| 90 | + double Kokkos::parallel_reduce / Exclusive TIME |
| 91 | + double Kokkos::parallel_reduce / Inclusive TIME |
| 92 | + double MPI_Sendrecv() / Calls |
| 93 | + double MPI_Sendrecv() / Exclusive TIME |
| 94 | + double MPI_Sendrecv() / Inclusive TIME |
| 95 | +
|
| 96 | +Here, the variables prefixed with ``TAU:rank:thread:MetaData:`` provide contextual information about the profiling run, such as the number of CPU cores or the current working directory. |
| 97 | +Subsequent variables capture performance metrics for specific code regions (e.g., ``BP5Writer::EndStep``, ``Kokkos::parallel_reduce``, ``MPI_Sendrecv()``), including the number of calls, exclusive execution time (time spent solely within the function), and inclusive execution time (total time spent within the function including calls to other functions). |
| 98 | + |
| 99 | +Having TAU performance metrics stored as ADIOS files offers a couple of advantages for managing and analyzing performance data: |
| 100 | + |
| 101 | +* **Campaign Integration:** Performance files can be seamlessly into campaigns alongside simulation output data. |
| 102 | +* **Near Real-time Streaming:** The performance metrics can be streamed in near real time using ADIOS's streaming capabilities. This enables live performance monitoring and analysis of long-running simulations, providing immediate insights into the application's behavior as it executes. |
| 103 | + |
| 104 | +A tutorial on how to use TAU with the ADIOS2 plugin can be found here (page 206): `https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf <https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf>`_. |
| 105 | + |
0 commit comments