Skip to content

Commit 4077459

Browse files
khushal1996jkotas
authored andcommitted
Scalar/Packed conversions for floating point to integer (dotnet#97529)
* merging with main Initial changes for scalar conversion double -> ulong * Basic working version of double -> ulong saturation * Moving the code in a do-while with proper checks to amke sure we are adding the fixup node at all cases * adjusting comments * Merging with main Saturating NaN to 0 and also adding Dbl2Ulng implementation in MathHelpers. Adding vector conversion support for double /float -> ulong conversion * removing conflicts from gentree.h flags merging with main doubel to uint conversion * float to uint conversion verified. removing commented code * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * progress on double to long morphing * another attempt at double to long conversion * Merge with main Merge with main adding a new helper function ofr float to uint scalar conversion for SSE2. * adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * adding float to int working scalar conversion case. Working on vectro case here on. * partial work on float to int packed conversion * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * complete conversions code for floating point to integral conversions for scalar/packed for SSE / avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * accomodate unsigned in IR * adding evex support for cvttss2si * Mergw with main defining nativeaot helpers for x86 * Catch divide by zero exception * Handle overflow cases * Fix tests to check saturating behavior * Correct mapping of instructions * Convert float -> ulong / long as float -> double -> ulong / long * Merging with main Initial changes for scalar conversion double -> ulong * Merging with main adjusting comments * removing conflicts from gentree.h flags merging with main doubel to uint conversion * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * adding a new helper function ofr float to uint scalar conversion for SSE2. * Merging with main adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * Changing the way helper functions are handled in morph fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * Correct mapping of instructions * delete extra files * Merging main review changes * Merge with main and adding new helpers in nativeaot Rebasing with main * changing type of cast node as signed when making cast nodes * Avoiding removing extra element from the stack * Fix formatting, Change comp->IsaSupportedDebugOnly to IsBaselineVector512SupportedDebugOnly * Reverting some changes to maintain uniformity in code * Handling cases where AVX512 is not supported in simdashwintrinsic.cpp * fixing exit conditions for ConvertVectorT_ToDouble * Check for AVX512 support for TARGET_XARCH * Avoid avx512 path for x86 * Enable AVX512F codepath for conversions in x86 arch. Move x86 to using c++ helpers * Add SSE41 path for scalar conversions and 128 bit float to int packed conversions * Adding SSE41 path for floating point to UINT scalar conversions * Add AVX path for ConvertToInt32 * Adding comments and cleaning the code * Fix errors in double to ulong * Addressing review comments * Fix tests * Reverse val < 0 check in dbltoUint and dbltoUlng helpers * Add overflow conversions for 86/x64, remove FastDbl2Lng and inline it * Apply suggestions from code review Co-authored-by: Jan Kotas <[email protected]> * Correct Dbl2UlngOvf * Apply suggestions from code review * Apply suggestions from code review * Update src/coreclr/vm/jithelpers.cpp * Disable failing mono tests * Working version of saturating logic moved to lowering for x86/x64 * Making changes for pre SSE41 * Apply suggestions from code review Co-authored-by: Jan Kotas <[email protected]> * Removing dead code * Fix formatting * Address review comments, add proper docstrings --------- Co-authored-by: Jan Kotas <[email protected]>
1 parent 63d62fc commit 4077459

30 files changed

+987
-597
lines changed

src/coreclr/inc/jithelpers.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,11 @@
5555
JITHELPER(CORINFO_HELP_ULMOD, JIT_ULMod, CORINFO_HELP_SIG_16_STACK)
5656
JITHELPER(CORINFO_HELP_LNG2DBL, JIT_Lng2Dbl, CORINFO_HELP_SIG_8_STACK)
5757
JITHELPER(CORINFO_HELP_ULNG2DBL, JIT_ULng2Dbl, CORINFO_HELP_SIG_8_STACK)
58-
DYNAMICJITHELPER(CORINFO_HELP_DBL2INT, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
58+
JITHELPER(CORINFO_HELP_DBL2INT, JIT_Dbl2Int, CORINFO_HELP_SIG_8_STACK)
5959
JITHELPER(CORINFO_HELP_DBL2INT_OVF, JIT_Dbl2IntOvf, CORINFO_HELP_SIG_8_STACK)
60-
DYNAMICJITHELPER(CORINFO_HELP_DBL2LNG, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
60+
JITHELPER(CORINFO_HELP_DBL2LNG, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
6161
JITHELPER(CORINFO_HELP_DBL2LNG_OVF, JIT_Dbl2LngOvf, CORINFO_HELP_SIG_8_STACK)
62-
DYNAMICJITHELPER(CORINFO_HELP_DBL2UINT, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
62+
JITHELPER(CORINFO_HELP_DBL2UINT, JIT_Dbl2UInt, CORINFO_HELP_SIG_8_STACK)
6363
JITHELPER(CORINFO_HELP_DBL2UINT_OVF, JIT_Dbl2UIntOvf, CORINFO_HELP_SIG_8_STACK)
6464
JITHELPER(CORINFO_HELP_DBL2ULNG, JIT_Dbl2ULng, CORINFO_HELP_SIG_8_STACK)
6565
JITHELPER(CORINFO_HELP_DBL2ULNG_OVF, JIT_Dbl2ULngOvf, CORINFO_HELP_SIG_8_STACK)

src/coreclr/jit/codegenxarch.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7602,21 +7602,24 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
76027602
noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG))));
76037603

76047604
// We shouldn't be seeing uint64 here as it should have been converted
7605-
// into a helper call by either front-end or lowering phase.
7606-
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))));
7605+
// into a helper call by either front-end or lowering phase, unless we have AVX512F
7606+
// accelerated conversions.
7607+
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))) ||
7608+
compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));
76077609

76087610
// If the dstType is TYP_UINT, we have 32-bits to encode the
76097611
// float number. Any of 33rd or above bits can be the sign bit.
76107612
// To achieve it we pretend as if we are converting it to a long.
7611-
if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))))
7613+
if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))) &&
7614+
!compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512F))
76127615
{
76137616
dstType = TYP_LONG;
76147617
}
76157618

76167619
// Note that we need to specify dstType here so that it will determine
76177620
// the size of destination integer register and also the rex.w prefix.
76187621
genConsumeOperands(treeNode->AsOp());
7619-
instruction ins = ins_FloatConv(TYP_INT, srcType, emitTypeSize(srcType));
7622+
instruction ins = ins_FloatConv(dstType, srcType, emitTypeSize(srcType));
76207623
GetEmitter()->emitInsBinary(ins, emitTypeSize(dstType), treeNode, op1);
76217624
genProduceReg(treeNode);
76227625
}

src/coreclr/jit/compiler.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3204,6 +3204,14 @@ class Compiler
32043204
CorInfoType simdBaseJitType,
32053205
unsigned simdSize);
32063206

3207+
#if defined(TARGET_XARCH)
3208+
GenTree* gtNewSimdCvtNode(var_types type,
3209+
GenTree* op1,
3210+
CorInfoType simdTargetBaseJitType,
3211+
CorInfoType simdSourceBaseJitType,
3212+
unsigned simdSize);
3213+
#endif //TARGET_XARCH
3214+
32073215
GenTree* gtNewSimdCreateBroadcastNode(
32083216
var_types type, GenTree* op1, CorInfoType simdBaseJitType, unsigned simdSize);
32093217

src/coreclr/jit/emit.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4012,7 +4012,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
40124012
case INS_comiss:
40134013
case INS_cvtss2sd:
40144014
case INS_cvtss2si:
4015-
case INS_cvttss2si:
4015+
case INS_cvttss2si32:
4016+
case INS_cvttss2si64:
40164017
case INS_divss:
40174018
case INS_extractps:
40184019
case INS_insertps:
@@ -4055,7 +4056,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
40554056
case INS_comisd:
40564057
case INS_cvtsd2si:
40574058
case INS_cvtsd2ss:
4058-
case INS_cvttsd2si:
4059+
case INS_cvttsd2si32:
4060+
case INS_cvttsd2si64:
40594061
case INS_divsd:
40604062
case INS_maxsd:
40614063
case INS_minsd:

src/coreclr/jit/emitxarch.cpp

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1522,9 +1522,11 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
15221522
switch (ins)
15231523
{
15241524
case INS_cvtss2si:
1525-
case INS_cvttss2si:
1525+
case INS_cvttss2si32:
1526+
case INS_cvttss2si64:
15261527
case INS_cvtsd2si:
1527-
case INS_cvttsd2si:
1528+
case INS_cvttsd2si32:
1529+
case INS_cvttsd2si64:
15281530
case INS_movd:
15291531
case INS_movnti:
15301532
case INS_andn:
@@ -1544,7 +1546,6 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
15441546
#endif // TARGET_AMD64
15451547
case INS_vcvtsd2usi:
15461548
case INS_vcvtss2usi:
1547-
case INS_vcvttsd2usi:
15481549
{
15491550
if (attr == EA_8BYTE)
15501551
{
@@ -2723,8 +2724,10 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
27232724
case INS_blsmsk:
27242725
case INS_blsr:
27252726
case INS_bzhi:
2726-
case INS_cvttsd2si:
2727-
case INS_cvttss2si:
2727+
case INS_cvttsd2si32:
2728+
case INS_cvttsd2si64:
2729+
case INS_cvttss2si32:
2730+
case INS_cvttss2si64:
27282731
case INS_cvtsd2si:
27292732
case INS_cvtss2si:
27302733
case INS_extractps:
@@ -2748,7 +2751,8 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
27482751
#endif
27492752
case INS_vcvtsd2usi:
27502753
case INS_vcvtss2usi:
2751-
case INS_vcvttsd2usi:
2754+
case INS_vcvttsd2usi32:
2755+
case INS_vcvttsd2usi64:
27522756
case INS_vcvttss2usi32:
27532757
case INS_vcvttss2usi64:
27542758
{
@@ -11605,22 +11609,20 @@ void emitter::emitDispIns(
1160511609
break;
1160611610
}
1160711611

11608-
case INS_cvttsd2si:
11612+
case INS_cvttsd2si32:
11613+
case INS_cvttsd2si64:
1160911614
case INS_cvtss2si:
1161011615
case INS_cvtsd2si:
11611-
case INS_cvttss2si:
11616+
case INS_cvttss2si32:
11617+
case INS_cvttss2si64:
1161211618
case INS_vcvtsd2usi:
1161311619
case INS_vcvtss2usi:
11614-
case INS_vcvttsd2usi:
11615-
{
11616-
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
11617-
break;
11618-
}
11619-
11620+
case INS_vcvttsd2usi32:
11621+
case INS_vcvttsd2usi64:
1162011622
case INS_vcvttss2usi32:
1162111623
case INS_vcvttss2usi64:
1162211624
{
11623-
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_4BYTE));
11625+
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
1162411626
break;
1162511627
}
1162611628

@@ -19048,7 +19050,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
1904819050
break;
1904919051
}
1905019052

19051-
case INS_cvttsd2si:
19053+
case INS_cvttsd2si32:
19054+
case INS_cvttsd2si64:
1905219055
case INS_cvtsd2si:
1905319056
case INS_cvtsi2sd32:
1905419057
case INS_cvtsi2ss32:
@@ -19057,7 +19060,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
1905719060
case INS_vcvtsd2usi:
1905819061
case INS_vcvtusi2ss32:
1905919062
case INS_vcvtusi2ss64:
19060-
case INS_vcvttsd2usi:
19063+
case INS_vcvttsd2usi32:
19064+
case INS_vcvttsd2usi64:
1906119065
case INS_vcvttss2usi32:
1906219066
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
1906319067
result.insLatency += PERFSCORE_LATENCY_7C;
@@ -19069,7 +19073,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
1906919073
result.insLatency += PERFSCORE_LATENCY_5C;
1907019074
break;
1907119075

19072-
case INS_cvttss2si:
19076+
case INS_cvttss2si32:
19077+
case INS_cvttss2si64:
1907319078
case INS_cvtss2si:
1907419079
case INS_vcvtss2usi:
1907519080
result.insThroughput = PERFSCORE_THROUGHPUT_1C;

0 commit comments

Comments
 (0)