- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
#78303 Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) #81993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsI created a draft for the issue. 
 | 
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | case NI_SSE_And: | ||
| case NI_SSE2_And: | ||
| case NI_AVX_And: | ||
| case NI_AVX2_And: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Vector128/256_And and AdvSimd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector64/128/256_And don't exist outside of import at the moment so they don't need to be handled.
AdvSimd should be since we want parity between xarch and arm.
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | case NI_AVX_And: | ||
| case NI_AVX2_And: | ||
| { | ||
| if (node->GetOperandCount() != 2) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when exactly it might be not 2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't ever not be 2. If it was, we'd have a buggy node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an assert then instead?
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | if (op1->OperIs(GT_HWINTRINSIC)) | ||
| { | ||
| rhs = op2; | ||
| inner_hw = op1->AsHWIntrinsic(); | ||
| } | ||
| // Transforms v2 & (~v1) to VectorXxx.AndNot(v1, v2) | ||
| else if (op2->OperIs(GT_HWINTRINSIC)) | ||
| { | ||
| rhs = op1; | ||
| inner_hw = op2->AsHWIntrinsic(); | ||
| } | ||
| else | ||
| { | ||
| return node; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to miss the optimization for cases like: ((x & y) & ~z)
You're going to need to check that it is a hwintrinsic and that it is the relevant xor (xarch and arm) or not (arm only) node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also potentially a concern around side effects and ensuring that ~x & y, which must be represented as gtNewSimdBinOpNode(AND_NOT, y, x, ...) preserves side effects with regards to x being evaluted before y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have resolved some comments and pushed them to make sure I got you right.
Could you please give me a hint how to treat not for arm and how to test it and how to handle the sideeffect case?
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | if ((inner_hw->GetOperandCount() != 2) || (!inner_hw->Op(2)->IsVectorAllBitsSet())) | ||
| { | ||
| return node; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to check this as part of handling _Xor below, that way you don't need to check the operand count and its easier for the general logic to support AdvSimd_Not on Arm64.
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | var_types hw_type = node->TypeGet(); | ||
| CorInfoType hw_coretype = node->GetSimdBaseJitType(); | ||
| unsigned int hw_simdsize = node->GetSimdSize(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We refer to these as just simdType, simdBaseJitType, and simdSize almost everywhere else in the JIT.
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | GenTreeHWIntrinsic* xor_hw = op1->AsHWIntrinsic(); | ||
| switch (xor_hw->GetHWIntrinsicId()) | ||
| { | ||
| #if defined(TARGET_XARCH) || defined(TARGET_ARM64) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, you're already in a larger identical ifdef from L10873.
That being said, the larger identical ifdef on L10873 should also be unnecessary given we're in a greater #ifdef FEATURE_HW_INTRINSICS
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      | } | ||
|  | ||
| // Transforms v2 & (~v1) to VectorXxx.AndNot(v2, v1) | ||
| if (op2->OperIs(GT_HWINTRINSIC)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is going to miss the opt if we have something like ((x ^ AllBitsSet) & (y ^ z). Such a tree could have been transformed into AndNot((y ^ z), x)
In general you're going to need to match (op1 ^ AllBitsSet) up front before determining if its a match and then if that fails do the same check for (op2 ^ AllBitsSet).
For Arm64, you'll also need to directly check for ~op1 or ~op2 (since NI_AdvSimd_Not exists).
There are some things we could do to make this overall simpler, but they are slightly more involved changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd, in general, recommend extracting some of this to a helper.
For example, you could define something like:
genTreeOps GenTreeHWIntrinsic::HWOperGet()
{
    switch (GetHWIntrinsicId())
    {
#if defined(TARGET_XARCH)
        case NI_SSE_And:
        case NI_SSE2_And:
        case NI_AVX_And:
        case NI_AVX2_And:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_And:
#endif
        {
            return GT_AND;
        }
#if defined(TARGET_ARM64)
        case NI_AdvSimd_Not:
        {
            return GT_NOT;
        }
#endif
#if defined(TARGET_XARCH)
        case NI_SSE_Xor:
        case NI_SSE2_Xor:
        case NI_AVX_Xor:
        case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_Xor:
#endif
        {
            return GT_XOR;
        }
        // TODO: Handle other cases
        default:
        {
            return GT_NONE;
        }
    }
}Such a helper allows you to instead switch over the genTreeOps equivalent. So you could have something like:
switch (node->HWOperGet())
{
    case GT_AND:
    {
        GenTree* op1 = node->Op(1);
        GenTree* op2 = node->Op(2);
        GenTree* lhs = nullptr;
        GenTree* rhs = nullptr;
        if (op1->OperIsHWIntrinsic())
        {
            // Try handle: ~op1 & op2
            GenTreeHWIntrinsic* hw     = op1->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();
            if (hwOper == GT_NOT)
            {
                lhs = op2;
                rhs = op1;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);
                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp1;
                }
            }
        }
        if ((lhs == nullptr) && op2->OperIsHWIntrinsic())
        {
            // Try handle: op1 & ~op2
            GenTreeHWIntrinsic* hw     = op2->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();
            if (hwOper == GT_NOT)
            {
                lhs = op1;
                rhs = op2;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);
                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp1;
                }
            }
        }
        if (lhs == nullptr)
        {
            break;
        }
        GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, simdType, lhs, rhs, simdBaseJitType, simdSize, true);
        DEBUG_DESTROY_NODE(node);
        INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);
        return andnNode;
    }
    default:
    {
        break;
    }
}You could of course also extract the NOT op vs op XOR AllBitsSet matching logic to reduce duplication as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term, I think we may want to introduce a "fake" Isa_Not hwintrinsic id for xarch. That would allow morph to transform x ^ AllBitsSet into Isa_Not and then would in turn allow this case to be simplified in its pattern checks.
We may also want to normalize cases like Sse_Xor, Sse2_Xor, and AdvSimd_Xor into Vector128_Xor, so we don't need to consider xplat differences. But that will also involve significant refactorings, far more so than introducing a HWOperGet helper for the time being.
|  | ||
| genTreeOps GenTreeHWIntrinsic::HWOperGet() | ||
| { | ||
| switch (GetHWIntrinsicId()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding do you think we can then (not necessarily in this PR) add this to the table where intrinsics are defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do so, yes.
However, I rather think we'd want to represent it just a little bit differently to avoid bloating the metadata tables given most intrinsics end up as none.
        
          
                src/coreclr/jit/morph.cpp
              
                Outdated
          
        
      |  | ||
| // Transforms: | ||
| // 1.(~v1 & v2) to VectorXxx.AndNot(v1, v2) | ||
| // 2.(v1 & (~v2)) to VectorXxx.AndNot(v1, v2) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean (v2, v1) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
| @SkiFoD, please resolve the merge conflict. .NET 8 rc1 snap is 8/14. | 
| 
 OK, I will look at it as soon as possible. | 
c68ba6f    to
    bec1de3      
    Compare
  
    | @SkiFoD thanks! sorry for the delay, there was also a small issue in the codegen that I fixed | 
The new logic introduced in dotnet#81993 would swap the LHS and RHS of the operands without any additional checks for side effects. Fix dotnet#91855
I created a draft for the issue #78303
The code converts ~v1 & v2 to VectorXxx.AndNot(v1, v2)