This might be a super easy performance win for workloads that shuffle a lot of data. (@mariusae noted this internally; putting here to share).