Skip to content

Conversation

@SkiFoD
Copy link
Contributor

@SkiFoD SkiFoD commented Feb 11, 2023

I created a draft for the issue #78303
The code converts ~v1 & v2 to VectorXxx.AndNot(v1, v2)

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 11, 2023
@ghost
Copy link

ghost commented Feb 11, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

I created a draft for the issue.
The code converts ~v1 & v2 to VectorXxx.AndNot(v1, v2)

Author: SkiFoD
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

case NI_SSE_And:
case NI_SSE2_And:
case NI_AVX_And:
case NI_AVX2_And:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Vector128/256_And and AdvSimd ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector64/128/256_And don't exist outside of import at the moment so they don't need to be handled.

AdvSimd should be since we want parity between xarch and arm.

case NI_AVX_And:
case NI_AVX2_And:
{
if (node->GetOperandCount() != 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when exactly it might be not 2 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ever not be 2. If it was, we'd have a buggy node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an assert then instead?

Comment on lines 10890 to 10904
if (op1->OperIs(GT_HWINTRINSIC))
{
rhs = op2;
inner_hw = op1->AsHWIntrinsic();
}
// Transforms v2 & (~v1) to VectorXxx.AndNot(v1, v2)
else if (op2->OperIs(GT_HWINTRINSIC))
{
rhs = op1;
inner_hw = op2->AsHWIntrinsic();
}
else
{
return node;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to miss the optimization for cases like: ((x & y) & ~z)

You're going to need to check that it is a hwintrinsic and that it is the relevant xor (xarch and arm) or not (arm only) node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also potentially a concern around side effects and ensuring that ~x & y, which must be represented as gtNewSimdBinOpNode(AND_NOT, y, x, ...) preserves side effects with regards to x being evaluted before y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved some comments and pushed them to make sure I got you right.
Could you please give me a hint how to treat not for arm and how to test it and how to handle the sideeffect case?

Comment on lines 10906 to 10909
if ((inner_hw->GetOperandCount() != 2) || (!inner_hw->Op(2)->IsVectorAllBitsSet()))
{
return node;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to check this as part of handling _Xor below, that way you don't need to check the operand count and its easier for the general logic to support AdvSimd_Not on Arm64.

Comment on lines 10922 to 10924
var_types hw_type = node->TypeGet();
CorInfoType hw_coretype = node->GetSimdBaseJitType();
unsigned int hw_simdsize = node->GetSimdSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We refer to these as just simdType, simdBaseJitType, and simdSize almost everywhere else in the JIT.

GenTreeHWIntrinsic* xor_hw = op1->AsHWIntrinsic();
switch (xor_hw->GetHWIntrinsicId())
{
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, you're already in a larger identical ifdef from L10873.

That being said, the larger identical ifdef on L10873 should also be unnecessary given we're in a greater #ifdef FEATURE_HW_INTRINSICS

}

// Transforms v2 & (~v1) to VectorXxx.AndNot(v2, v1)
if (op2->OperIs(GT_HWINTRINSIC))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is going to miss the opt if we have something like ((x ^ AllBitsSet) & (y ^ z). Such a tree could have been transformed into AndNot((y ^ z), x)

In general you're going to need to match (op1 ^ AllBitsSet) up front before determining if its a match and then if that fails do the same check for (op2 ^ AllBitsSet).

For Arm64, you'll also need to directly check for ~op1 or ~op2 (since NI_AdvSimd_Not exists).

There are some things we could do to make this overall simpler, but they are slightly more involved changes.

Copy link
Member

@tannergooding tannergooding Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd, in general, recommend extracting some of this to a helper.

For example, you could define something like:

genTreeOps GenTreeHWIntrinsic::HWOperGet()
{
    switch (GetHWIntrinsicId())
    {
#if defined(TARGET_XARCH)
        case NI_SSE_And:
        case NI_SSE2_And:
        case NI_AVX_And:
        case NI_AVX2_And:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_And:
#endif
        {
            return GT_AND;
        }

#if defined(TARGET_ARM64)
        case NI_AdvSimd_Not:
        {
            return GT_NOT;
        }
#endif

#if defined(TARGET_XARCH)
        case NI_SSE_Xor:
        case NI_SSE2_Xor:
        case NI_AVX_Xor:
        case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_Xor:
#endif
        {
            return GT_XOR;
        }

        // TODO: Handle other cases

        default:
        {
            return GT_NONE;
        }
    }
}

Such a helper allows you to instead switch over the genTreeOps equivalent. So you could have something like:

switch (node->HWOperGet())
{
    case GT_AND:
    {
        GenTree* op1 = node->Op(1);
        GenTree* op2 = node->Op(2);
        GenTree* lhs = nullptr;
        GenTree* rhs = nullptr;

        if (op1->OperIsHWIntrinsic())
        {
            // Try handle: ~op1 & op2
            GenTreeHWIntrinsic* hw     = op1->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();

            if (hwOper == GT_NOT)
            {
                lhs = op2;
                rhs = op1;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);

                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp1;
                }
            }
        }

        if ((lhs == nullptr) && op2->OperIsHWIntrinsic())
        {
            // Try handle: op1 & ~op2
            GenTreeHWIntrinsic* hw     = op2->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();

            if (hwOper == GT_NOT)
            {
                lhs = op1;
                rhs = op2;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);

                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp1;
                }
            }
        }

        if (lhs == nullptr)
        {
            break;
        }

        GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, simdType, lhs, rhs, simdBaseJitType, simdSize, true);

        DEBUG_DESTROY_NODE(node);
        INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);

        return andnNode;
    }

    default:
    {
        break;
    }
}

You could of course also extract the NOT op vs op XOR AllBitsSet matching logic to reduce duplication as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, I think we may want to introduce a "fake" Isa_Not hwintrinsic id for xarch. That would allow morph to transform x ^ AllBitsSet into Isa_Not and then would in turn allow this case to be simplified in its pattern checks.

We may also want to normalize cases like Sse_Xor, Sse2_Xor, and AdvSimd_Xor into Vector128_Xor, so we don't need to consider xplat differences. But that will also involve significant refactorings, far more so than introducing a HWOperGet helper for the time being.

@SkiFoD SkiFoD marked this pull request as ready for review March 8, 2023 06:08
@SkiFoD SkiFoD changed the title #78303 Draft: Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) #78303 Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) Mar 8, 2023

genTreeOps GenTreeHWIntrinsic::HWOperGet()
{
switch (GetHWIntrinsicId())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding do you think we can then (not necessarily in this PR) add this to the table where intrinsics are defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do so, yes.

However, I rather think we'd want to represent it just a little bit differently to avoid bloating the metadata tables given most intrinsics end up as none.


// Transforms:
// 1.(~v1 & v2) to VectorXxx.AndNot(v1, v2)
// 2.(v1 & (~v2)) to VectorXxx.AndNot(v1, v2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean (v2, v1) here?

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@JulieLeeMSFT
Copy link
Member

@SkiFoD, please resolve the merge conflict. .NET 8 rc1 snap is 8/14.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 14, 2023
@SkiFoD
Copy link
Contributor Author

SkiFoD commented Aug 14, 2023

@SkiFoD, please resolve the merge conflict. .NET 8 rc1 snap is 8/14.

OK, I will look at it as soon as possible.

@EgorBo EgorBo merged commit 13a225f into dotnet:main Sep 4, 2023
@EgorBo
Copy link
Member

EgorBo commented Sep 4, 2023

@SkiFoD thanks! sorry for the delay, there was also a small issue in the codegen that I fixed

jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Sep 11, 2023
The new logic introduced in dotnet#81993 would swap the LHS and RHS of the
operands without any additional checks for side effects.

Fix dotnet#91855
jakobbotsch added a commit that referenced this pull request Sep 11, 2023
…91882)

The new logic introduced in #81993 would swap the LHS and RHS of the
operands without any additional checks for side effects.

Fix #91855
@ghost ghost locked as resolved and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants