Skip to content

Conversation

@cmhhelgeson
Copy link
Contributor

Description

Creates an add-on that encapsulates the bitonic sort functionality present in the webgpu_compute_sort_bitonic example. Currently only handles scalar inputs because I'm uncertain whether TSL currently emulates the boolean vector functionality of GLSL. I've also removed the timestamps from the bitonic sort example since they aren't really informative when presented at such high speed, and one can already perceive that the new encapsulated bitonic sort takes less dispatches than the previous local sort and the global sort only example.

@cmhhelgeson cmhhelgeson marked this pull request as draft September 8, 2025 05:32
@cmhhelgeson
Copy link
Contributor Author

cmhhelgeson commented Sep 8, 2025

The actual sort itself is done and works as expected for multiple data types and counts, but I'm leaving this in draft for now till its more thoroughly reviewed. As it's the first class of its type (encapsulated GPGPU sort/operation) I would like to ensure that the maintainers agree on the class structure and documentation before it goes in, as it could inform how contributors implement/structure future encapsulated GPGPU operations in the future. Obviously this is WebGPURenderer only, and the documentation should likely change to expose this more transparently to users ( class BitonicSort should also be renamed class BitonicSortGPU ).

There are also improvements that can be made to the class that may or may not be considered blocking such as:

  • Ping-Pong Data Buffers: We can ping-pong the dataBuffer and the tempBuffer (input and output buffers) between global sort steps. This could improve performance by cutting back on compute dispatches. A sort would at most need only one alignment dispatch for an in-place sort if the final global op moved data from the dataBuffer to the tempBuffer. Implemented in (Addons: GPGPU - Fix Bitonic Sort JSDoc and add Ping/Pong Buffers #31949)
  • Multiple Comparison Options: The user should be able to specify a reverse sort (i.e a swap executes on a greaterThan rather than a lessThan)
  • Side Effects: Though the sort may primarily operate on a single buffer, the result of the sort might drive how multiple other buffers are reassigned. For instance, a sort that takes a series of linearized indices of a particle's location within a 3-dimensional spatial hash grid may need to both sort those indices and reassign particles within the particle buffer based on the sorted indices. The user could manage this themselves, or the bitonicSortModule could internally manage this as a sideEffect of the sort, possibly by swapping the particles alongside the index swap or by implementing a different strategy.
  • Subgroup Optimizations: I'm investigating potential optimizations using the recently implemented subgroup functionality. This article describes some potential optimizations using subgroups: https://winwang.blog/posts/bitonic-sort/ but I am still investigating other valid optimizations.

@cmhhelgeson cmhhelgeson marked this pull request as ready for review September 9, 2025 20:34
@cmhhelgeson
Copy link
Contributor Author

Is there anything blocking this PR. If possible, I would like to use it for performance optimizations within the compute bird sample, but would like the class structure to be reviewed for the reasons stated above.

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Sep 22, 2025

Wouldn't it be more straightforward for a Sort Module to simply perform the sorting operation, with perhaps an option, only in very specific and advanced cases, to manually control its update?

Something such as:

const bitonicSort = new BitonicSort({
  dataBuffer: arrayToSort,
});

renderer.setAnimationLoop(async () => {
  await bitonicSort.compute();

  renderer.compute(computeProgramUsingSortedArray);
});

And for manual sorting step control:

const bitonicSort = new BitonicSort({
  dataBuffer: arrayToSort,
});

renderer.setAnimationLoop(async () => {
  while (!bitonicSort.isSorted) {
    await bitonicSort.step();
  }

  renderer.compute(computeProgramUsingSortedArray);
});

@sunag sunag added this to the r181 milestone Sep 22, 2025
@cmhhelgeson
Copy link
Contributor Author

cmhhelgeson commented Sep 22, 2025

Wouldn't it be more straightforward for a Sort Module to simply perform the sorting operation, with perhaps an option, only in very specific and advanced cases, to manually control its update?

There is already a function for this at the bottom of the class:

async compute( renderer ) {

	this.globalOpsRemaining = 0;
	this.globalOpsInSpan = 0;
	this.currentDispatch = 0;

	for ( let i = 0; i < this.stepCount; i ++ ) {

		await this.computeStep( renderer );

	}

}

The Bitonic Sort example does compute step rather than a full compute to visually represent the swaps. The current implementation uses multiple dispatches to perform the sort, so its also leveraging the computeStep functionality. More efficient algorithms could likely do a sort in a single dispatch. Accordingly, for those sorts, the compute step function would execute different code than the main sort algorithm, but in bitonic sort's case, it goes through this.computeStep for both the complete sort and the step-by-step sort.

@cmhhelgeson
Copy link
Contributor Author

const bitonicSort = new BitonicSort({
  dataBuffer: arrayToSort,
});

renderer.setAnimationLoop(async () => {
  await bitonicSort.compute();

  renderer.compute(computeProgramUsingSortedArray);
});

I would prefer something like this object syntax for the arguments.

@sunag sunag merged commit 48ecb78 into mrdoob:dev Sep 23, 2025
8 checks passed

const scene = new THREE.Scene();

const infoArray = new Uint32Array( 3, 2, 2 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be new Uint32Array( [ 3, 2, 2 ] )? Not sure if it matters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This line should definitely be updated. Typed arrays only accept a single scalar which defines their length. The current code ends up like so:

image

@arcman7
Copy link

arcman7 commented Nov 7, 2025

Does this sort faster than the pure js + worker sort used by playcanvas for 3DGS scenes?

@cmhhelgeson
Copy link
Contributor Author

cmhhelgeson commented Nov 7, 2025

Does this sort faster than the pure js + worker sort used by playcanvas for 3DGS scenes?

It would depend on your use case. If you're only sorting 32 or 256 elements then it would probably be performant to sort on the CPU, as the dispatch time between the CPU and the GPU may be larger than the time to sort. Additionally Bitonic Sort is only really performant for 2^n elements and is not as generalizable or as performant as a GPU radix sort.

You'd also, as far as I'm aware, be the first person to ever use the TSL Bitonic Sort outside of the three.js bitonic sort example.

EDIT: @arcman7 This would not be faster than your company's existing WebGPU/WGSL radix sort IMO. However, there should no longer be any barrier toward implementing a radix sort within TSL now that Three.js has workgroup barriers, subgroup functionality, dispatchWorkgroupsIndirect, and most of the compute functionality you'd need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants