The counter table in hwcpipe/src/hwcpipe/all_gpu_counters.cpp is very large and has a very long compile time (up to 10 minutes on my build machine). We should break this up to improve build efficiency:
- Split into multiple files, e.g. one per GPU architecture?
- Add indirection to remove duplicates where multiple GPU IDs share the same counter definitions.
This should more than halve the size of the table, with further benefits from parallel builds from splitting the files.