Skip to content

Conversation

@cvijdea-bd
Copy link
Contributor

@cvijdea-bd cvijdea-bd commented Aug 25, 2024

With target_features = "avx512vbmi", swizzle_dyn with N = 32 did not set output lanes to 0 when the input index was out of range, because it used _mm256_permutexvar_epi8 (vpermb) which, unlike _mm256_shuffle_epi8 (vpshufb), does not provide that behaviour.

This PR fixes the problem and adds the avx512vbmi implementation for N = 64.

@calebzulawski
Copy link
Member

Looks good to me! Thanks! FYI @workingjubilee

@calebzulawski calebzulawski merged commit f6519c5 into rust-lang:master Aug 27, 2024
@workingjubilee
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants