Fix UMAP graph thresholding #6595

viclafargue · 2025-04-28T13:28:31Z

Answers #6539
~~Requires rapidsai/raft#2739~~

This PR:

Trims the graph before embedding initialization
Stores the graph on host when using the UMAP estimator

jinsolp

Thank you for this PR Victor! I have a question before I move on with further reviews. : )

jinsolp · 2025-04-28T17:42:42Z

cpp/src/umap/runner.cuh

+
+  value_t threshold = get_threshold(handle, fss_graph, inputs.n, params->n_epochs);
+  perform_thresholding(handle, fss_graph, threshold);
+
  raft::sparse::op::coo_remove_zeros<value_t>(&fss_graph, graph, stream);
 }


graph is the final output exposed at the python level. It looks like umap-learn runs the fuzzy simplicial operation (link, corresponding to our FuzzySimplSetImpl::symmetrize, then eliminates zeros, which becomes the final output.

I think if we doperform_thresholding here, we end up storing the graph after thresholding as the final graph, which seems different from that umap-learn has?
I understand that we need to do the thresholding before the embedding init, but think by doing this we end up with a different output graph (which should correspond to umap-learn's self.graph_). Please correct me if I'm wrong!

My thinking was that once the graph reaches the Python layer, we can no longer make assumptions about how the user intends to use it—so we should avoid modifying or potentially corrupting it. Since our goal is to trim the graph, that would require creating a copy, which could significantly increase VRAM usage. I acknowledge this approach doesn't exactly align with how umap-learn handles it, but I assumed that most users would primarily be interested in a fuzzy simplicial set graph that retains only the most important connections. I had also even considered performing the trimming before the symmetrization step, since the thrust operations there appears to cause a memory spike—if I’ve understood correctly.

However, I just realized that because the trimming threshold depends on the number of training epochs, trimming before storing the graph might work well with the UMAP estimator, but not as effectively with fuzzy_simplicial_set, which lacks access to the number of epochs. In that case, creating a copy might indeed be the only viable option—unless the graph is stored as a SciPy array on the Cython side?

In that case, creating a copy might indeed be the only viable option—unless the graph is stored as a SciPy array on the Cython side?

When Corey and I discussed this a month or two ago, IIRC we came to agreement that the graph_ attribute could (and probably should) be returned to the user on host. It's mostly there for introspection or for umap-learn compatibility. We do want to make sure the graph is being treated roughly the same as umap-learn does so zero-code-change methods that reference it (e.g. merging of estimators) work roughly the same. Since we never reference it ourselves, having it on host would be fine and would decrease the device memory pressure at this point.

…sholding

csadorf · 2025-05-14T21:29:20Z

This PR depends on rapidsai/raft#2650 . Is it realistic that it will be merged before burndown?

divyegala · 2025-05-14T21:38:26Z

@csadorf no, RAFT burndown is tomorrow. Let's move this to 25.08.

copy-pr-bot · 2025-08-27T15:14:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

python/cuml/tests/test_umap.py

jinsolp

Thanks Victor! 👍 I left some suggestions and questions!

jinsolp · 2025-09-04T17:31:40Z

python/cuml/tests/test_umap.py

+    if num_clusters == 5:
+        pytest.skip("Skipping test for 5 clusters")


Sorry if I missed anything, but why would we want to skip test in this case?

It looks like there is an issue with do_snmg = True and num_clusters == 5 for some reason. It may be because my workstation only has 2 GPUs though, but CI only has one. Basically there are a lot of -1 in the indices for some reasons. The issue happens even apart from this PR on main branch.

More details here.

cpp/src/umap/runner.cuh

jinsolp · 2025-09-05T22:38:15Z

cpp/src/umap/runner.cuh

+  bool apply_thresholding = should_perform_thresholding(handle, in, threshold);
+
+  if (apply_thresholding) {
+    trim_graph(handle, in, out, threshold);


I think we should also be calling perform_thresholding here before we trim the graph?
Also, is my understanding correct that trim_graph doing what raft::sparse::op::coo_remove_zeros is doing?

The trimming was already performing a thresholding with a thrust::copy_if.

After thinking about it again, I decided to revert to a simpler systematic element-wise thresholding (setting to 0) followed by a coo_remove_zeros operation. The reasons are the following : 1) The performance of thrust::copy_if is probably not optimal. 2) The graph now being stored on host means it won't alter the graph in the graph_ attribute anymore 3) The graph will in most cases require a thresholding operation anyway, making it conditional through some checks is probably slower overall.

cpp/src/umap/runner.cuh

jcrist

Gave a brief read through and logically this all makes sense to me. Left a few small notes of things I'm less-than-happy-about (but don't view as blockers), but overall this is an improvement.

jcrist · 2025-09-08T16:24:26Z

cpp/src/umap/runner.cuh

+    copy_device_graph_to_host(handle, graph, host_graph);
+
+    trim_graph(handle, graph, trimmed_graph, inputs.n, params->n_epochs);
+  }


I assume the scope here is to drop graph once the trimming is done to reduce device memory pressure? If so, can you add a small comment noting that?

Also, is there any way the trimming could be done in place? Last time I checked this was one of the memory highpoints, so reducing duplications of the graph here would be beneficial if easy.

I assume the scope here is to drop graph once the trimming is done to reduce device memory pressure?

Exactly.

Also, is there any way the trimming could be done in place?

Might be wrong, but I don't think that we can do this efficiently in-place with CUDA. Performing the operation on host might be relatively fast and would avoid doubling VRAM use here, but it would be probably much slower than doing it on device with a copy.

jcrist · 2025-09-08T16:41:00Z

python/cuml/cuml/manifold/umap_utils.pyx

+        cols = create_nonowning_numpy_array(self.cols(), np.int32)
+
+        graph = scipy.sparse.coo_matrix((vals.copy(), (rows.copy(), cols.copy())))
+        return graph


This creates a full copy of the COO with memory allocated by numpy instead of within libcuml. This doubles the host memory (original host COO + new host COO), but the original should be freed momentarily after once the HostGraphHolder is dropped.

I don't love this, but I'm fine with this for now. Note that if we wanted, we could instead expose the COO from libcuml directly through the buffer protocol, avoiding the brief memory doubling.

Can you add a small comment/docstring on this method just noting that this returns a copy currently (to help future readers).

the original should be freed momentarily after once the HostGraphHolder is dropped.

Exactly, the function that instantiate the HostGraphHolder should release it immediately (modulo the garbage collector).

we could instead expose the COO from libcuml directly through the buffer protocol, avoiding the brief memory doubling

The COO matrix should have a predictable nnz before thresholding, so this would indeed be theoretically possible to pre-allocate the matrix on host and expose it in libcuml. However, we would like to use the raft::host_coo_matrix for the libcuml API and this utility can not be used in a non-owning way for now.

We don't have to preallocate it, we can expose the memory directly. I don't think this is worth doing though unless we find that host memory becomes a bottleneck. What you have now is fine, just please add the comment/docstring I requested.

jcrist · 2025-09-08T16:50:41Z

python/cuml/tests/test_umap.py

@@ -839,6 +839,8 @@ def test_umap_distance_metrics_fit_transform_trust_on_sparse_input(
 def test_umap_trustworthiness_on_batch_nnd(
    num_clusters, fit_then_transform, metric, do_snmg
 ):
+    if do_snmg and num_clusters == 5:
+        pytest.xfail("xfailing snmg test with 5 clusters for now")


Why is this being xfailed? Can this be removed? If it can't, can you update the reason to be more descriptive of why it's xfailed so future readers can know when it can be removed (and if it's a TODO open a follow-up issue)?

Also: xfail markers like this don't give the test a chance to run and xpass - there's no indicator when running the tests when this can be un-xfailed.

Ideally if a test is flaky or failing we instead handle this by refactoring the parameters to mark the bad case as an xfail. In that way the test still runs, but failures will be ignored.

After some investigation it looks like there is an issue in this test when running it on multiple GPUs with parameters (do_snmg = True and num_clusters == 5). The KNN step returns -1 for the indices. The issue is present in the main branch too. Setting CUDA_VISIBLE_DEVICES=0 removes the issue explaining why this does not seem to pop up in the CI.

Since, it is not related to this PR I removed the xfail. I will open an issue instead.

Fix graph thresholding

19d32da

viclafargue requested a review from a team as a code owner April 28, 2025 13:28

viclafargue requested review from dantegd and divyegala April 28, 2025 13:28

github-actions bot added the CUDA/C++ label Apr 28, 2025

csadorf linked an issue Apr 28, 2025 that may be closed by this pull request

Differences between umap.UMAP and cuml.UMAP in embeddings logic #6539

Open

viclafargue requested review from jcrist and jinsolp April 28, 2025 16:52

jinsolp reviewed Apr 28, 2025

View reviewed changes

Use a copy instead

7ee6704

viclafargue changed the title ~~Fix graph thresholding~~ Fix UMAP graph thresholding May 1, 2025

Store UMAP graph as SciPy (instead of cuPy) COO array

00c4294

viclafargue requested a review from a team as a code owner May 5, 2025 12:02

github-actions bot added the Cython / Python Cython or Python issue label May 5, 2025

viclafargue added 2 commits May 5, 2025 14:03

Merge remote-tracking branch 'origin/branch-25.06' into fix-graph-tre…

3e56f84

…sholding

Store graph on host memory

21063d1

viclafargue requested a review from a team as a code owner May 6, 2025 14:37

viclafargue requested a review from vyasr May 6, 2025 14:37

github-actions bot added the CMake label May 6, 2025

viclafargue mentioned this pull request May 6, 2025

Adding host COO utility rapidsai/raft#2650

Closed

divyegala assigned viclafargue May 6, 2025

viclafargue added 2 commits June 30, 2025 12:55

Merge branch 'branch-25.08' into fix-graph-tresholding

1dfd460

Use new COO utility for host storage

c8402b3

viclafargue requested review from a team as code owners July 1, 2025 12:27

github-actions bot added conda conda issue ci labels Jul 1, 2025

viclafargue changed the base branch from branch-25.06 to branch-25.08 July 9, 2025 15:42

Merge branch 'branch-25.08' into fix-graph-tresholding

dfcb2e5

github-actions bot removed conda conda issue CMake ci labels Jul 9, 2025

viclafargue added 2 commits July 10, 2025 16:48

Merge branch 'branch-25.08' into fix-graph-tresholding

814b9b8

Merge branch 'branch-25.10' into fix-graph-tresholding

b184300

github-actions bot added conda conda issue CMake ci labels Aug 27, 2025

improvements

1ebb7d0

viclafargue changed the base branch from branch-25.08 to branch-25.10 August 27, 2025 15:17

viclafargue added bug Something isn't working breaking Breaking change labels Sep 5, 2025

csadorf reviewed Sep 5, 2025

View reviewed changes

python/cuml/tests/test_umap.py Outdated Show resolved Hide resolved

jinsolp reviewed Sep 5, 2025

View reviewed changes

viclafargue added 2 commits September 8, 2025 16:15

Simplifying graph trimming

3917f4f

Merge branch 'branch-25.10' into fix-graph-tresholding

06ceb80

github-actions bot removed conda conda issue CMake ci labels Sep 8, 2025

viclafargue added 2 commits September 8, 2025 18:02

xfailing num_clusters == 5 case

6e3a9f2

adding snmg conditional

db1bdef

jcrist reviewed Sep 8, 2025

View reviewed changes

viclafargue added 2 commits September 9, 2025 18:40

Adding comments

b799d71

Removing xfail

9ed90ab

		if num_clusters == 5:
		pytest.skip("Skipping test for 5 clusters")

Fix UMAP graph thresholding #6595

Are you sure you want to change the base?

Fix UMAP graph thresholding #6595

Uh oh!

Conversation

viclafargue commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

jinsolp Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viclafargue Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csadorf commented May 14, 2025

Uh oh!

divyegala commented May 14, 2025

Uh oh!

copy-pr-bot bot commented Aug 27, 2025

Uh oh!

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viclafargue commented Apr 28, 2025 •

edited

Loading

jinsolp Apr 28, 2025 •

edited

Loading

viclafargue Apr 29, 2025 •

edited

Loading