Skip to content

Conversation

@evetsso
Copy link
Contributor

@evetsso evetsso commented Oct 29, 2025

Motivation

Fix load/store callbacks on single-process multi-GPU transforms.

Technical Details

Callbacks were not being attached to the correct TreeNodes during multi-GPU plan execution. Even though callbacks are only known at execute time, we can piggyback on the load/store ops infrastructure for extra operations we want to do that are known at plan creation time.

Modified the load/store ops to be a std::optional on plans. If the optional load/store op is present, then that means this node in the plan is doing the first load from global memory or last store to global memory for the FFT. The load/store op will std::nullopt for internal nodes of a multi-kernel FFT, or for communication steps.

At execution time, we can then look at presence/absence of the load/store op to know if load/store callbacks should be applied to that node of the transform.

We also need to build up an internal device->callback map, so that we can know exactly which callback to invoke for the device that this node executes on. This can only be constructed at a time when we have both the plan description (which knows the field/brick layout, and specifically the ordering of bricks and the devices they reside on) and an execution info (which has the callbacks supplied by the user).

Test Plan

Enabled callback test cases in rocFFT and hipFFT for single-proc multi-GPU transforms.

Test Result

Tests pass.

Submission Checklist

The std::optional is only present on TreeNodes for the first load from
and last store to global memory of the FFT, and is std::nullopt otherwise.

This information lets us know which nodes need to run callback functions,
even if no other load/store ops need to be run.
@evetsso
Copy link
Contributor Author

evetsso commented Oct 29, 2025

These commits can be reviewed independently. Multi-GPU CI on this PR is not expected to pass until #2328 is merged, as that PR modifies the tests to supply the multi-device callbacks correctly.

MPI callbacks are probably also fixed by this, but we need some additional work in the test infrastructure to be able to confirm this.

@evetsso evetsso changed the title fix callbacks for single-proc multi-GPU transforms rocfft: fix callbacks for single-proc multi-GPU transforms Oct 29, 2025
@evetsso evetsso removed the NoCI Don't run CI label Oct 30, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 22.48062% with 100 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
projects/rocfft/library/src/callback_map.cpp 22.73% 45 Missing and 6 partials ⚠️
projects/rocfft/library/src/plan.cpp 8.33% 20 Missing and 2 partials ⚠️
projects/rocfft/library/src/powX.cpp 0.00% 14 Missing and 1 partial ⚠️
projects/rocfft/library/src/transform.cpp 36.36% 6 Missing and 1 partial ⚠️
projects/rocfft/library/src/load_store_ops_gen.cpp 50.00% 2 Missing and 2 partials ⚠️
projects/rocfft/library/src/tree_node.cpp 0.00% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (48.85%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2350      +/-   ##
===========================================
+ Coverage    49.04%   50.16%   +1.12%     
===========================================
  Files          124      125       +1     
  Lines        32149    32224      +75     
  Branches      4230     4250      +20     
===========================================
+ Hits         15766    16163     +397     
+ Misses       15237    14897     -340     
- Partials      1146     1164      +18     
Flag Coverage Δ
hipFFT 62.91% <ø> (+0.30%) ⬆️
rocFFT 48.85% <22.48%> (+1.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ojects/rocfft/library/src/include/load_store_ops.h 65.22% <ø> (ø)
projects/rocfft/library/src/include/plan.h 11.54% <ø> (ø)
...s/rocfft/library/src/include/rtc_realcomplex_gen.h 100.00% <ø> (ø)
projects/rocfft/library/src/include/transform.h 100.00% <ø> (ø)
projects/rocfft/library/src/include/tree_node.h 30.62% <ø> (ø)
...jects/rocfft/library/src/load_store_ops_kernel.cpp 37.50% <100.00%> (+5.68%) ⬆️
projects/rocfft/library/src/rtc_stockham_gen.cpp 73.05% <ø> (+0.47%) ⬆️
projects/rocfft/library/src/tree_node.cpp 36.26% <0.00%> (-0.49%) ⬇️
projects/rocfft/library/src/load_store_ops_gen.cpp 18.18% <50.00%> (+3.09%) ⬆️
projects/rocfft/library/src/transform.cpp 33.51% <36.36%> (+0.88%) ⬆️
... and 3 more

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@malcolmroberts
Copy link
Contributor

projects/rocfft/clients/tests/#accuracy_test.cpp# seems like an accidental commit.

Copy link
Contributor

@malcolmroberts malcolmroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to remove the temp file, and should be good.

@regan-amd regan-amd self-requested a review October 31, 2025 15:48
auto params = param_generator_complex(test_prob,
multi_gpu_sizes,
precision_range_sp_dp,
{4, 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep using multi_gpu_batch_range, ioffset_range_zero and ooffset_range_zero (declared above)? I find that more readable. If not let's remove them as they'd become unused.

{{0, 0}},
{fft_placement_inplace, fft_placement_notinplace},
false,
run_callbacks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add auto_alloc_setting (otherwise unused) to make sure we don't have misleading results the day these instances are re-enabled.

Comment on lines +48 to +61
if(load)
{
if(src_fn)
callbacks[b.location.device].load_fn = src_fn[src_idx];
if(src_data)
callbacks[b.location.device].load_data = src_data[src_idx];
}
else
{
if(src_fn)
callbacks[b.location.device].store_fn = src_fn[src_idx];
if(src_data)
callbacks[b.location.device].store_data = src_data[src_idx];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that bricks of a given field must be assigned different devices, but I'm not clear regarding two bricks of two (different) fields: can those be assigned the same device, conceptually? If yes, aren't we risking to overwrite the content of a callbacks[b.location.device] at some point above?

// we have at most one load callback
if(exec_info.load_cb_fns)
{
callbacks.resize(local_device + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: defining callbacks as an actual std::map<int, device_callback_t> would avoid having local_device irrelevant entries therein.
It might also help identify possible conflicts if the other comment above is relevant.

Comment on lines +391 to +392
const std::optional<LoadOps> loadOps,
const std::optional<StoreOps> storeOps,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::optional<LoadOps> loadOps,
const std::optional<StoreOps> storeOps,
const std::optional<LoadOps>& loadOps,
const std::optional<StoreOps>& storeOps,

if it can be done.

Comment on lines +1897 to +1898
const std::optional<LoadOps> loadOps,
const std::optional<StoreOps> storeOps,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::optional<LoadOps> loadOps,
const std::optional<StoreOps> storeOps,
const std::optional<LoadOps>& loadOps,
const std::optional<StoreOps>& storeOps,

(cf above).

@af-ayala af-ayala self-requested a review November 4, 2025 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants