Skip to content

Conversation

@lipovsek-aws
Copy link

@lipovsek-aws lipovsek-aws commented Apr 12, 2023

Adding CSV format support as discussed in #120.

cc: @sjeaugey @jbachan I don't have permissions to add reviewers

Example CSV

,,,,,out-of-place,out-of-place,out-of-place,out-of-place,in-place,in-place,in-place,in-place,,,,,,,,,,,,,,,,,,,
size, count, type, redop, root, time, algbw, busbw, #wrong, time, algbw, busbw, #wrong, nthreads, ngpus, minbytes, maxbytes, stepbytes, stepfactor, check, warmup_iters, iters, agg_iters, op, datatype, root, parallel_init, blocking, stream_null, timeout, cudagraph, report_cputime
(B), (elements),,,,(us),(GB/s),(GB/s),,(us),(GB/s),(GB/s),,,,,,,,,,,,,,,,,,,,
8, 2, float, sum,     -1,   71.41, 0.000112, 0.000196, 0,   71.96, 0.000111, 0.000195, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
16, 4, float, sum,     -1,   71.23, 0.000225, 0.000393, 0,   71.36, 0.000224, 0.000392, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
32, 8, float, sum,     -1,   71.35, 0.000448, 0.000785, 0,   71.37, 0.000448, 0.000785, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
64, 16, float, sum,     -1,   71.44, 0.000896, 0.001568, 0,   71.42, 0.000896, 0.001568, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
128, 32, float, sum,     -1,   71.48, 0.001791, 0.003134, 0,   71.62, 0.001787, 0.003128, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
256, 64, float, sum,     -1,   71.51, 0.003580, 0.006265, 0,   71.63, 0.003574, 0.006254, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
512, 128, float, sum,     -1,   70.57, 0.007255, 0.012696, 0,   70.53, 0.007259, 0.012703, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
1024, 256, float, sum,     -1,   73.93, 0.013851, 0.024239, 0,   73.74, 0.013887, 0.024302, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
2048, 512, float, sum,     -1,   81.55, 0.025114, 0.043950, 0,   81.20, 0.025223, 0.044140, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
4096, 1024, float, sum,     -1,   87.83, 0.046635, 0.081612, 0,   87.63, 0.046743, 0.081799, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
8192, 2048, float, sum,     -1,   90.92, 0.090106, 0.157685, 0,   89.59, 0.091438, 0.160016, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
16384, 4096, float, sum,     -1,   97.53, 0.167992, 0.293987, 0,   96.76, 0.169335, 0.296336, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
32768, 8192, float, sum,     -1,   113.9, 0.287798, 0.503647, 0,   112.4, 0.291478, 0.510087, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
65536, 16384, float, sum,     -1,   117.9, 0.555787, 0.972628, 0,   116.8, 0.560930, 0.981628, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
131072, 32768, float, sum,     -1,   123.8, 1.059004, 1.853257, 0,   121.8, 1.075704, 1.882482, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
262144, 65536, float, sum,     -1,   123.2, 2.127501, 3.723126, 0,   123.0, 2.131539, 3.730194, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
524288, 131072, float, sum,     -1,   125.9, 4.165867, 7.290268, 0,   125.0, 4.192759, 7.337327, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
1048576, 262144, float, sum,     -1,   132.6, 7.906989, 13.837230, 0,   131.5, 7.974148, 13.954759, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
2097152, 524288, float, sum,     -1,   138.7, 15.116383, 26.453671, 0,   139.7, 15.006742, 26.261799, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
4194304, 1048576, float, sum,     -1,   152.6, 27.490021, 48.107536, 0,   152.0, 27.585272, 48.274223, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
8388608, 2097152, float, sum,     -1,   178.2, 47.072647, 82.377136, 0,   176.1, 47.628494, 83.349861, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
16777216, 4194304, float, sum,     -1,   210.7, 79.638687, 139.367706, 0,   207.9, 80.707413, 141.237961, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
33554432, 8388608, float, sum,     -1,   348.3, 96.344292, 168.602509, 0,   348.2, 96.366364, 168.641129, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
67108864, 16777216, float, sum,     -1,   579.1, 115.894623, 202.815598, 0,   578.7, 115.967583, 202.943268, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
134217728, 33554432, float, sum,     -1,  1155.2, 116.185478, 203.324585, 0,  1154.5, 116.260414, 203.455734, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
268435456, 67108864, float, sum,     -1,  2144.0, 125.200325, 219.100571, 0,  2146.4, 125.061523, 218.857666, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
536870912, 134217728, float, sum,     -1,  4173.8, 128.629837, 225.102203, 0,  4171.8, 128.689590, 225.206787, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
1073741824, 268435456, float, sum,     -1,  8141.3, 131.887619, 230.803329, 0,  8140.9, 131.894684, 230.815704, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
2147483648, 536870912, float, sum,     -1,   16141, 133.042206, 232.823853, 0,   16139, 133.059082, 232.853394, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
4294967296, 1073741824, float, sum,     -1,   31982, 134.292343, 235.011612, 0,   31984, 134.283905, 234.996841, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0
8589934592, 2147483648, float, sum,     -1,   63772, 134.697052, 235.719849, 0,   63782, 134.675430, 235.682007, 0,1,1,8,9663676416,1048576,2,1,5,100,1,sum,float,0,0,0,0,0,0,0

@lipovsek-aws lipovsek-aws marked this pull request as ready for review April 13, 2023 14:36
src/common.cu Outdated
if (!isMainThread()){
return;
}
persistanceMode = csvName != "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather call everything outputMode rather than persistanceMode. Not sure "persistance" would be well understood.

@jbachan
Copy link
Contributor

jbachan commented Apr 24, 2023

Is there a strong reason to use hierarchical columns? That seems like it might be unique to PANDAS, it certainly isn't universal among relational data models, and my preference would be to keep the CSV as simple as possible.

My proposal is to only have singular busbw and algbw columns naked of qualifications for in/out of place, and add an additinoal two-valued column to indicate in/out of place. This would result in there being different rows for the in-place and out of place runs. Example:

collective, size, datatype, redop,  placement, algbw, busbw, #wrongs
 allreduce, 1024,    float,   sum,         in,   2.0,   1.0,       0
 allreduce, 1024,    float,   sum,        out,   2.0,   1.0,       0

I think this fits the conceptual mapping of independent/dependent variables as columns better, since "in-place busbw" isn't a different measurement than "out-of-place busbw", the measurement is just "busbw" ran under two different setups which are described by the independent variables.

@lipovsek-aws
Copy link
Author

Is there a strong reason to use hierarchical columns? That seems like it might be unique to PANDAS, it certainly isn't universal among relational data models, and my preference would be to keep the CSV as simple as possible.

My proposal is to only have singular busbw and algbw columns naked of qualifications for in/out of place, and add an additinoal two-valued column to indicate in/out of place. This would result in there being different rows for the in-place and out of place runs. Example:

collective, size, datatype, redop,  placement, algbw, busbw, #wrongs
 allreduce, 1024,    float,   sum,         in,   2.0,   1.0,       0
 allreduce, 1024,    float,   sum,        out,   2.0,   1.0,       0

I think this fits the conceptual mapping of independent/dependent variables as columns better, since "in-place busbw" isn't a different measurement than "out-of-place busbw", the measurement is just "busbw" ran under two different setups which are described by the independent variables.

Good point. I wanted to keep it as close to stdout table as possible and I remember working with hierarchical columns in R (I think that hierarchical columns are just general concept with tables and not related to a python library), the idea is that users can the reshuffle dataframe however they want it. If there's not going to be anyone else against this I will move to schema you propose and remove column with units. Just note that there's a lot of columns you didn't include in your example (parameters for the nccl-tests) and I'll keep them in - they were actually also a reason why I didn't look further as user has to normalize the table either way to put it in relational DB and then might as well keep it close to stdout table - but I agree your proposal is a good middle ground.

@samos123
Copy link

Any change of this getting merged? Parsing logs to generate CSV is a bit cumbersome. This would be a very welcome change imo.

@nravic
Copy link

nravic commented May 22, 2025

+1 for having CSV output, would be very useful

@talbdev
Copy link

talbdev commented Jul 28, 2025

+1 for CSV format. Anything blocking this from being merge ? it is really useful for some automated infra checks.

@OguzPastirmaci
Copy link

+1 for CSV format, so it's easier to consume the results in automated health checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants