I have some existing avx/sse masks that I got the old way:
auto mask_sse = _mm_cmplt_ps(a, b);
auto mask_avx = _mm_cmp_ps(a, b, 17);
In some circumstances when mixing old avx code with new avx512 code, I want to convert these old style masks into the new avx512 __mmask4
or __mmask8
types.
I tried this:
auto mask_avx512 = _mm_cmp_ps_mask(sse_mask, _mm_setzero_ps(), 25/*nge unordered quiet*/);
and it seems to work for plain old outputs of comparisons, but I don’t think it would capture positive NANs correctly that could have been used with an sse4.1 _mm_blendv_ps
.
There also is good old _mm_movemask_ps
but that looks like it puts the mask all the way out in a general purpose register, and I would need to chain it with a _cvtu32_mask8
to pull it back into one of the dedicated mask registers.
Is there a cleaner way to just directly pull the sign bit out of an old style mask into one of the k registers?
Example Code:
Here’s an example program doing the sort of mask conversion the first way I mentioned above
#include "x86intrin.h"
#include <cassert>
#include <cstdio>
int main()
{
auto a = _mm_set_ps(-1, 0, 1, 2);
auto c = _mm_set_ps(3, 4, 5, 6);
auto sse_mask = _mm_cmplt_ps(a, _mm_setzero_ps());
auto avx512_mask = _mm_cmp_ps_mask(sse_mask, _mm_setzero_ps(), 25);
auto blended = _mm_blendv_ps(a, c, sse_mask);
auto blended512 = _mm_mask_blend_ps(avx512_mask, a, c);
alignas(16) float v1(4);
alignas(16) float v2(4);
_mm_store_ps(v1, _mm_blendv_ps(a, c, sse_mask));
_mm_store_ps(v2, _mm_mask_blend_ps(avx512_mask, a, c));
assert(v1(0) == v2(0));
assert(v1(1) == v2(1));
assert(v1(2) == v2(2));
assert(v1(3) == v2(3));
return 0;
}