[WIP] Add Counted Attributes #546
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add more kernel attributes by automatically counting operations using wrapper types.
This could either replace our manual counts or act as a check on their accuracy. It can also get accurate counts in some cases where we have been estimating. In some other cases we may not be able to use this like in Algorithm_SORT where we use std::sort. Its important to note that getting accurate counts after compiler optimization is still difficult but in most cases manual optimization can still get us good counts.
Note that this requires C++20 at the moment, but it could be back-ported to C++17 with SFINAE instead of concepts.
At the moment I'm interested in if people think this is a reasonable direction to take.
If so are there any things that I'm missing that I could be capturing with wrappers types and instrumentation.
As an example I used the counters in Apps_PRESSURE, APPS_VOL3D, and Polybench_JACOBI_2D. Note that I discovered opportunities to optimize redundant loads in PRESSURE and VOL3D kernels and found a copy paste error in VOL3D in examining these counters and comparing them to the manual "Estimate" counters.
Below are the normal attributes followed by the counted attributes. After that is a breakdown of each kernel with counters for each section of the kernel, by using enough macros it was possible to capture the code of the entire kernel and print it out.