-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Shared generics opcodes #116990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared generics opcodes #116990
Conversation
- boxint, unbox, unbox.any, cast, isinst, ldtoken, newarr, newobj Added additional diagnostic printing capabilities, to print strings, and method/class names in interesting scenarios Known continuing work - MDArray new - static variables and class constructor triggering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Implements shared generic support in the interpreter by adding new opcodes, wiring up generic lookups, and enhancing disassembly printing for generic contexts.
Key changes:
- Added shared‐generics helper opcodes (
call.helper.pg
,newobj.generic
, etc.) and their handling ininterpexec.cpp
andintops.def/h
. - Extended the compiler to emit and print generic lookup data (
InterpGenericLookup
) and pointer tagging for improved pretty‐printing. - Expanded the JIT/interpreter tests in
Interpreter.cs
to cover shared generics scenarios (box/unbox, castclass, newarr, new MD array,isinst
,ldtoken
,newobj
).
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/tests/JIT/interpreter/Interpreter.cs | Added tests for most shared‐generics opcodes |
src/coreclr/vm/interpexec.cpp | Implemented runtime generic lookups via DO_GENERIC_LOOKUP macro |
src/coreclr/interpreter/intops.h | Declared new opcode argument types for generic helpers |
src/coreclr/interpreter/intops.def | Defined new helper‐ and lookup‐based opcodes and replaced old ones |
src/coreclr/interpreter/datastructs.h | Marked GetSize() and GetUnderlyingArray() as const |
src/coreclr/interpreter/compiler.h | Introduced InterpGenericLookup metadata and new emit helpers |
src/coreclr/interpreter/compiler.cpp | Wired up emission of generic helper calls and enhanced printing |
Comments suppressed due to low confidence (1)
src/tests/JIT/interpreter/Interpreter.cs:2094
- The new shared-generics tests cover most opcodes, but there’s no dedicated test for
ldtoken
on a generic type. Consider adding aTestSharedGenerics_LdToken<T>()
to verify interpreter lookup and printing for token handles.
return true;
src/coreclr/interpreter/compiler.h
Outdated
#ifdef DEBUG | ||
void *ptr = (void*)clsHnd; | ||
if (!PointerInNameMap(ptr)) | ||
{ | ||
AddPointerToNameMap(ptr, PointerIsClassHandle); | ||
} | ||
#endif DEBUG | ||
} | ||
|
||
void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | ||
{ | ||
#ifdef DEBUG | ||
void *ptr = (void*)methodHnd; | ||
if (!PointerInNameMap(ptr)) | ||
{ | ||
AddPointerToNameMap(ptr, PointerIsMethodHandle); | ||
} | ||
#endif DEBUG | ||
} | ||
|
||
void DeclarePointerIsString(void* stringLiteral) | ||
{ | ||
#ifdef DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debug guards use #ifdef DEBUG
but the project typically defines _DEBUG
. Consider changing #ifdef DEBUG
to #ifdef _DEBUG
so these debug blocks actually compile in debug builds.
#ifdef DEBUG | |
void *ptr = (void*)clsHnd; | |
if (!PointerInNameMap(ptr)) | |
{ | |
AddPointerToNameMap(ptr, PointerIsClassHandle); | |
} | |
#endif DEBUG | |
} | |
void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | |
{ | |
#ifdef DEBUG | |
void *ptr = (void*)methodHnd; | |
if (!PointerInNameMap(ptr)) | |
{ | |
AddPointerToNameMap(ptr, PointerIsMethodHandle); | |
} | |
#endif DEBUG | |
} | |
void DeclarePointerIsString(void* stringLiteral) | |
{ | |
#ifdef DEBUG | |
#ifdef _DEBUG | |
void *ptr = (void*)clsHnd; | |
if (!PointerInNameMap(ptr)) | |
{ | |
AddPointerToNameMap(ptr, PointerIsClassHandle); | |
} | |
#endif // _DEBUG | |
} | |
void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | |
{ | |
#ifdef _DEBUG | |
void *ptr = (void*)methodHnd; | |
if (!PointerInNameMap(ptr)) | |
{ | |
AddPointerToNameMap(ptr, PointerIsMethodHandle); | |
} | |
#endif // _DEBUG | |
} | |
void DeclarePointerIsString(void* stringLiteral) | |
{ | |
#ifdef _DEBUG |
Copilot uses AI. Check for mistakes.
src/coreclr/interpreter/compiler.cpp
Outdated
m_pStackPointer--; | ||
EmitLdLocA(m_pStackPointer[0].var); | ||
} | ||
m_ip += 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing break;
at the end of the case CEE_UNBOX:
block causes fall-through into case CEE_UNBOX_ANY:
and will execute unintended logic. Add a break;
after this line.
m_ip += 5; | |
m_ip += 5; | |
break; |
Copilot uses AI. Check for mistakes.
src/coreclr/vm/interpexec.cpp
Outdated
@@ -291,6 +293,67 @@ template <typename TResult, typename TSource> void ConvOvfHelper(int8_t *stack, | |||
} | |||
} | |||
|
|||
#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The DO_GENERIC_LOOKUP
macro is very large and complex, leading to duplicated logic and reduced readability. Consider refactoring this into a dedicated function or a set of smaller helpers.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with copilot that I'd love to see this outlined into function(s). As-is the line number information when debugging it is gonna be kinda miserable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I was debugging an issue with it in the past and I had to expand the macro manually to be able to figure out what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, I've cleaned that up.
src/coreclr/interpreter/compiler.h
Outdated
TArray<PointerToName> m_pointerToNameMap; | ||
bool PointerInNameMap(void* ptr) | ||
{ | ||
for (int32_t i = 0; i < m_pointerToNameMap.GetSize(); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this linear search going to become a problem for us in large method bodies? Do we need to be using a hashtable yet? We have simdhash available since it's used in stackmap.cpp, though it doesn't have a predefined variant that would hold PointerToName (it's easy to add one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to use dn_simd_hash_ptr_ptr which is close enough. I added a C++ wrapper to handle lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi if you look at dn-simdhash-ght-compatible.c you can see how to define custom callbacks and whatnot. but there's nothing wrong with using ptr_ptr if you're okay with an extra allocation per item.
src/coreclr/interpreter/compiler.h
Outdated
GenericHandleData() = default; | ||
|
||
HelperArgType argType = HelperArgType::Value; | ||
int genericVar = -1; // Set to a meaningful value if HelperArgType is GenericResolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something more specific than 'a meaningful value' would be helpful here since by browsing around the vicinity I can't tell what makes the value meaningful or how I would distinguish between a valid genericVar and an invalid one. It may become obvious to me as I continue reading the PR, though!
My assumption upon first glance is it's the zero-based index of the generic parameter in the method's set of reachable generic parameters (where i use 'reachable generic parameters' to mean the set of all the enclosing type parameters + all the method parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm at this point again and it's 'the interp var containing the generic dictionary' I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interp var containing the generic context argument. (The dictionary is a set of indirections away typically)
src/coreclr/interpreter/compiler.cpp
Outdated
m_pLastNewIns->SetSVar(arg2); | ||
m_pLastNewIns->SetDVar(resultVar); | ||
|
||
m_pLastNewIns->data[1] = handleData.dataItemIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] There's an asymmetry here where in the GenericResolution arm of these ifs, we set data[0] and data[1] right next to each other, but in the other arm we set them separately. It seems like it could hide mistakes. It would be cool if both sides were the same shape so the differences were clearer to the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this cleaner.
src/coreclr/interpreter/compiler.cpp
Outdated
if (embedInfo.lookup.lookupKind.needsRuntimeLookup) | ||
{ | ||
CORINFO_RUNTIME_LOOKUP_KIND runtimeLookupKind = embedInfo.lookup.lookupKind.runtimeLookupKind; | ||
InterpGenericLookup *pRuntimeLookup = (InterpGenericLookup*)AllocMethodData(sizeof(InterpGenericLookup)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that GenericHandleToGenericHandleData could get called for the same embedInfo multiple times, and we'd bloat the dataitems table with lots of duplicate lookups? i.e. if a method has two generic parameters T and U, are we going to end up with a unique dataitem for every use of T and every use of U? Or just two data items, one for T and one for U, that get reused through some other mechanism I haven't noticed yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a problem. I've added a general purpose hashing mechanism for embedding data into the dataitems array, and swapped over to using that hash even for regular pointers. This should address any reasonable scalability issues here.
src/coreclr/vm/interpexec.cpp
Outdated
size_t lastOffset = pLookup->offsets[pLookup->indirections - 1]; \ | ||
if (pLookup->sizeOffset != CORINFO_NO_SIZE_CHECK) \ | ||
{ \ | ||
/* Last indirection is the size*/ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is news to me. I don't see corresponding information in interpretershared.h.
src/coreclr/vm/interpexec.cpp
Outdated
if (pLookup->indirections != 0) \ | ||
{ \ | ||
lookup = *(uint8_t**)(lookup + pLookup->offsets[0]); \ | ||
if (pLookup->indirections >= 3) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the 4 look like they're off by 1, but I assume that's on purpose and goes with the 'last indirection is the size' below
src/coreclr/vm/interpexec.cpp
Outdated
pMD = LOCAL_VAR(genericVar, MethodDesc*); \ | ||
lookup = (uint8_t*)pMD; \ | ||
} \ | ||
void* result = 0; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] It'd be nice if this name made it a little clearer that it's from the macro, or if it was defined at the top of the macro instead of hiding in the middle
src/coreclr/vm/interpexec.cpp
Outdated
@@ -291,6 +293,67 @@ template <typename TResult, typename TSource> void ConvOvfHelper(int8_t *stack, | |||
} | |||
} | |||
|
|||
#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] It's weird that genericVar is usually just ip[x]
and then lookupIndex is y
. It might make sense for genericVar to also just be x
so there's consistency.
LGTM other than the missing |
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
…kups, and embed the InterpGenericLookup in the dataitems array directly Fix generic lookups to handle the usehelper case Shrink the InterpGenericLookup to be the actual size needed, using smaller data fields when possible Improve comments on the interpreter around what the sizeOffset means
Co-authored-by: Adeel Mujahid <[email protected]>
dst->sizeOffset = 0; | ||
dst->offsets[0] = 0; | ||
dst->offsets[1] = 0; | ||
dst->offsets[2] = 0; | ||
dst->offsets[3] = 0; | ||
} | ||
else | ||
{ | ||
dst->sizeOffset = src->sizeOffset; | ||
dst->offsets[0] = (uint16_t)GetGenericLookupOffset(src, 0); | ||
dst->offsets[1] = (uint16_t)GetGenericLookupOffset(src, 1); | ||
dst->offsets[2] = (uint16_t)GetGenericLookupOffset(src, 2); | ||
dst->offsets[3] = (uint16_t)GetGenericLookupOffset(src, 3); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A loop would be nice but not mandatory
|
||
size_t sizeOfStruct = sizeof(InterpGenericLookup); | ||
|
||
assert(sizeOfFieldsConcatenated == sizeOfStruct); // Assert that there is no padding in the struct, so a fixed size hash unaware of padding is safe to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a static_assert right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but its easier to see what's wrong when its a runtime error, and since we have a functional PR system, it will be caught pretty quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is a good point that that doesn't really happen in release builds.
bool PointerInNameMap(void* ptr) | ||
{ | ||
void* result; | ||
if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result)) | ||
{ | ||
return true; | ||
} | ||
return false; | ||
} | ||
void AddPointerToNameMap(void* ptr, const char* name) | ||
{ | ||
if (!PointerInNameMap(ptr)) | ||
{ | ||
dn_simdhash_ptr_ptr_try_add(m_pointerToNameMap.GetValue(), ptr, (void*)name); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As things are written right now, we don't need PointerInNameMap
at all, and the try_add
operation will harmlessly return 0 if the key ptr
already exists.
bool PointerInNameMap(void* ptr) | ||
{ | ||
void* result; | ||
if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, you can pass NULL instead of &result and simdhash try_get_value will do what you want (not copy, just return whether a value was found or not)
assert(sizeof(VarSizedData) == sizeof(void*)); | ||
assert(sizeof(T) % sizeof(void*) == 0); | ||
assert(sizeof(VarSizedDataWithPayload<T>) == sizeof(T) + sizeof(void*)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably also be static_assert instead of assert, so that they will be checked in release builds
int base = 4; | ||
DO_GENERIC_LOOKUP(ip[2], base + 1); | ||
|
||
HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is confusing casting between HELPER_FTN_PP
and HELPER_FTN_PP_2
. Also it seems we are going with the convention _{retType}_{argType}{argType}..
. So HELPER_FTN_PP_2
should actually be renamed to HELPER_FTN_P_PP
. HELPER_FTN_PP
should be HELPER_FTN_P_P
. Otherwise it is just a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... fair. I was just copying/pasting here, and mostly spending my time thinking about the compiler side. I'm going to eliminate the code sharing between opcodes here, as I think its obfuscating the logic, and simplify this out.
|
||
HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]); | ||
void* helperArg = pMethod->pDataItems[ip[base + 1]]; | ||
LOCAL_VAR(ip[1], void*) = ((HELPER_FTN_PP_2)helperFtn)(helperArg, LOCAL_VAR_ADDR(ip[2], void*)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well
case INTOP_CALL_HELPER_V_AGP: | ||
{ | ||
int base = 4; | ||
DO_GENERIC_LOOKUP(ip[2], base + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems cleaner for DO_GENERIC_LOOKUP
to receive the data item index directly rather than an offset into the currently executing instruction.
return result; | ||
} | ||
|
||
#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this define anymore ? Doesn't seem to add more clarity. We could just replace it with a method returning result explicitly
Implement the vast majority of the opcodes that are affected by shared generics
Add logic to enhance pretty printing of the Interp opcodes for these things. We can now see strings, and various magic pointer values get displayed in the disasm outputs.
Handle the 0 indirections generic lookup scenario
Many opcodes were able to use a set of generic helpercall opcodes. These opcodes have a naming convention of
a
stands for addr of dvar or svarg
stands for do a generic lookupp
stands for pointer (acquired from either an svar or a dataitem)v
stands for a void return.Its not fully descriptive, but naming is hard, and this works.