Skip to content

Conversation

davidwrighton
Copy link
Member

Implement the vast majority of the opcodes that are affected by shared generics

  • box, unbox, unbox.any, castclass, isinst, ldtoken, newarr, newobj of an mdarray

Add logic to enhance pretty printing of the Interp opcodes for these things. We can now see strings, and various magic pointer values get displayed in the disasm outputs.

Handle the 0 indirections generic lookup scenario

Many opcodes were able to use a set of generic helpercall opcodes. These opcodes have a naming convention of

  • a stands for addr of dvar or svar
  • g stands for do a generic lookup
  • p stands for pointer (acquired from either an svar or a dataitem)
  • v stands for a void return.

Its not fully descriptive, but naming is hard, and this works.

- boxint, unbox, unbox.any, cast, isinst, ldtoken, newarr, newobj

Added additional diagnostic printing capabilities, to print strings, and method/class names in interesting scenarios

Known continuing work
- MDArray new
- static variables and class constructor triggering
@Copilot Copilot AI review requested due to automatic review settings June 24, 2025 22:38
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 24, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Implements shared generic support in the interpreter by adding new opcodes, wiring up generic lookups, and enhancing disassembly printing for generic contexts.
Key changes:

  • Added shared‐generics helper opcodes (call.helper.pg, newobj.generic, etc.) and their handling in interpexec.cpp and intops.def/h.
  • Extended the compiler to emit and print generic lookup data (InterpGenericLookup) and pointer tagging for improved pretty‐printing.
  • Expanded the JIT/interpreter tests in Interpreter.cs to cover shared generics scenarios (box/unbox, castclass, newarr, new MD array, isinst, ldtoken, newobj).

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tests/JIT/interpreter/Interpreter.cs Added tests for most shared‐generics opcodes
src/coreclr/vm/interpexec.cpp Implemented runtime generic lookups via DO_GENERIC_LOOKUP macro
src/coreclr/interpreter/intops.h Declared new opcode argument types for generic helpers
src/coreclr/interpreter/intops.def Defined new helper‐ and lookup‐based opcodes and replaced old ones
src/coreclr/interpreter/datastructs.h Marked GetSize() and GetUnderlyingArray() as const
src/coreclr/interpreter/compiler.h Introduced InterpGenericLookup metadata and new emit helpers
src/coreclr/interpreter/compiler.cpp Wired up emission of generic helper calls and enhanced printing
Comments suppressed due to low confidence (1)

src/tests/JIT/interpreter/Interpreter.cs:2094

  • The new shared-generics tests cover most opcodes, but there’s no dedicated test for ldtoken on a generic type. Consider adding a TestSharedGenerics_LdToken<T>() to verify interpreter lookup and printing for token handles.
        return true;

Comment on lines 354 to 376
#ifdef DEBUG
void *ptr = (void*)clsHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsClassHandle);
}
#endif DEBUG
}

void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd)
{
#ifdef DEBUG
void *ptr = (void*)methodHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsMethodHandle);
}
#endif DEBUG
}

void DeclarePointerIsString(void* stringLiteral)
{
#ifdef DEBUG
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug guards use #ifdef DEBUG but the project typically defines _DEBUG. Consider changing #ifdef DEBUG to #ifdef _DEBUG so these debug blocks actually compile in debug builds.

Suggested change
#ifdef DEBUG
void *ptr = (void*)clsHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsClassHandle);
}
#endif DEBUG
}
void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd)
{
#ifdef DEBUG
void *ptr = (void*)methodHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsMethodHandle);
}
#endif DEBUG
}
void DeclarePointerIsString(void* stringLiteral)
{
#ifdef DEBUG
#ifdef _DEBUG
void *ptr = (void*)clsHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsClassHandle);
}
#endif // _DEBUG
}
void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd)
{
#ifdef _DEBUG
void *ptr = (void*)methodHnd;
if (!PointerInNameMap(ptr))
{
AddPointerToNameMap(ptr, PointerIsMethodHandle);
}
#endif // _DEBUG
}
void DeclarePointerIsString(void* stringLiteral)
{
#ifdef _DEBUG

Copilot uses AI. Check for mistakes.

m_pStackPointer--;
EmitLdLocA(m_pStackPointer[0].var);
}
m_ip += 5;
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing break; at the end of the case CEE_UNBOX: block causes fall-through into case CEE_UNBOX_ANY: and will execute unintended logic. Add a break; after this line.

Suggested change
m_ip += 5;
m_ip += 5;
break;

Copilot uses AI. Check for mistakes.

@@ -291,6 +293,67 @@ template <typename TResult, typename TSource> void ConvOvfHelper(int8_t *stack,
}
}

#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The DO_GENERIC_LOOKUP macro is very large and complex, leading to duplicated logic and reduced readability. Consider refactoring this into a dedicated function or a set of smaller helpers.

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with copilot that I'd love to see this outlined into function(s). As-is the line number information when debugging it is gonna be kinda miserable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was debugging an issue with it in the past and I had to expand the macro manually to be able to figure out what's wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, I've cleaned that up.

TArray<PointerToName> m_pointerToNameMap;
bool PointerInNameMap(void* ptr)
{
for (int32_t i = 0; i < m_pointerToNameMap.GetSize(); i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this linear search going to become a problem for us in large method bodies? Do we need to be using a hashtable yet? We have simdhash available since it's used in stackmap.cpp, though it doesn't have a predefined variant that would hold PointerToName (it's easy to add one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to use dn_simd_hash_ptr_ptr which is close enough. I added a C++ wrapper to handle lifetime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi if you look at dn-simdhash-ght-compatible.c you can see how to define custom callbacks and whatnot. but there's nothing wrong with using ptr_ptr if you're okay with an extra allocation per item.

GenericHandleData() = default;

HelperArgType argType = HelperArgType::Value;
int genericVar = -1; // Set to a meaningful value if HelperArgType is GenericResolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more specific than 'a meaningful value' would be helpful here since by browsing around the vicinity I can't tell what makes the value meaningful or how I would distinguish between a valid genericVar and an invalid one. It may become obvious to me as I continue reading the PR, though!

My assumption upon first glance is it's the zero-based index of the generic parameter in the method's set of reachable generic parameters (where i use 'reachable generic parameters' to mean the set of all the enclosing type parameters + all the method parameters)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm at this point again and it's 'the interp var containing the generic dictionary' I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interp var containing the generic context argument. (The dictionary is a set of indirections away typically)

m_pLastNewIns->SetSVar(arg2);
m_pLastNewIns->SetDVar(resultVar);

m_pLastNewIns->data[1] = handleData.dataItemIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] There's an asymmetry here where in the GenericResolution arm of these ifs, we set data[0] and data[1] right next to each other, but in the other arm we set them separately. It seems like it could hide mistakes. It would be cool if both sides were the same shape so the differences were clearer to the reader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this cleaner.

if (embedInfo.lookup.lookupKind.needsRuntimeLookup)
{
CORINFO_RUNTIME_LOOKUP_KIND runtimeLookupKind = embedInfo.lookup.lookupKind.runtimeLookupKind;
InterpGenericLookup *pRuntimeLookup = (InterpGenericLookup*)AllocMethodData(sizeof(InterpGenericLookup));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that GenericHandleToGenericHandleData could get called for the same embedInfo multiple times, and we'd bloat the dataitems table with lots of duplicate lookups? i.e. if a method has two generic parameters T and U, are we going to end up with a unique dataitem for every use of T and every use of U? Or just two data items, one for T and one for U, that get reused through some other mechanism I haven't noticed yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a problem. I've added a general purpose hashing mechanism for embedding data into the dataitems array, and swapped over to using that hash even for regular pointers. This should address any reasonable scalability issues here.

size_t lastOffset = pLookup->offsets[pLookup->indirections - 1]; \
if (pLookup->sizeOffset != CORINFO_NO_SIZE_CHECK) \
{ \
/* Last indirection is the size*/ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is news to me. I don't see corresponding information in interpretershared.h.

if (pLookup->indirections != 0) \
{ \
lookup = *(uint8_t**)(lookup + pLookup->offsets[0]); \
if (pLookup->indirections >= 3) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the 4 look like they're off by 1, but I assume that's on purpose and goes with the 'last indirection is the size' below

pMD = LOCAL_VAR(genericVar, MethodDesc*); \
lookup = (uint8_t*)pMD; \
} \
void* result = 0; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] It'd be nice if this name made it a little clearer that it's from the macro, or if it was defined at the top of the macro instead of hiding in the middle

@@ -291,6 +293,67 @@ template <typename TResult, typename TSource> void ConvOvfHelper(int8_t *stack,
}
}

#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] It's weird that genericVar is usually just ip[x] and then lookupIndex is y. It might make sense for genericVar to also just be x so there's consistency.

@kg
Copy link
Member

kg commented Jun 25, 2025

LGTM other than the missing break copilot noticed.

@filipnavara filipnavara added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

…kups, and embed the InterpGenericLookup in the dataitems array directly

Fix generic lookups to handle the usehelper case
Shrink the InterpGenericLookup to be the actual size needed, using smaller data fields when possible
Improve comments on the interpreter around what the sizeOffset means
Comment on lines +141 to +154
dst->sizeOffset = 0;
dst->offsets[0] = 0;
dst->offsets[1] = 0;
dst->offsets[2] = 0;
dst->offsets[3] = 0;
}
else
{
dst->sizeOffset = src->sizeOffset;
dst->offsets[0] = (uint16_t)GetGenericLookupOffset(src, 0);
dst->offsets[1] = (uint16_t)GetGenericLookupOffset(src, 1);
dst->offsets[2] = (uint16_t)GetGenericLookupOffset(src, 2);
dst->offsets[3] = (uint16_t)GetGenericLookupOffset(src, 3);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A loop would be nice but not mandatory

@davidwrighton davidwrighton enabled auto-merge (squash) June 27, 2025 00:15

size_t sizeOfStruct = sizeof(InterpGenericLookup);

assert(sizeOfFieldsConcatenated == sizeOfStruct); // Assert that there is no padding in the struct, so a fixed size hash unaware of padding is safe to use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a static_assert right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but its easier to see what's wrong when its a runtime error, and since we have a functional PR system, it will be caught pretty quickly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is a good point that that doesn't really happen in release builds.

Comment on lines +540 to +555
bool PointerInNameMap(void* ptr)
{
void* result;
if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result))
{
return true;
}
return false;
}
void AddPointerToNameMap(void* ptr, const char* name)
{
if (!PointerInNameMap(ptr))
{
dn_simdhash_ptr_ptr_try_add(m_pointerToNameMap.GetValue(), ptr, (void*)name);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As things are written right now, we don't need PointerInNameMap at all, and the try_add operation will harmlessly return 0 if the key ptr already exists.

bool PointerInNameMap(void* ptr)
{
void* result;
if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, you can pass NULL instead of &result and simdhash try_get_value will do what you want (not copy, just return whether a value was found or not)

Comment on lines +853 to +855
assert(sizeof(VarSizedData) == sizeof(void*));
assert(sizeof(T) % sizeof(void*) == 0);
assert(sizeof(VarSizedDataWithPayload<T>) == sizeof(T) + sizeof(void*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably also be static_assert instead of assert, so that they will be checked in release builds

@davidwrighton davidwrighton merged commit 50be35a into dotnet:main Jun 27, 2025
145 of 156 checks passed
int base = 4;
DO_GENERIC_LOOKUP(ip[2], base + 1);

HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is confusing casting between HELPER_FTN_PP and HELPER_FTN_PP_2. Also it seems we are going with the convention _{retType}_{argType}{argType}... So HELPER_FTN_PP_2 should actually be renamed to HELPER_FTN_P_PP. HELPER_FTN_PP should be HELPER_FTN_P_P. Otherwise it is just a mess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... fair. I was just copying/pasting here, and mostly spending my time thinking about the compiler side. I'm going to eliminate the code sharing between opcodes here, as I think its obfuscating the logic, and simplify this out.


HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]);
void* helperArg = pMethod->pDataItems[ip[base + 1]];
LOCAL_VAR(ip[1], void*) = ((HELPER_FTN_PP_2)helperFtn)(helperArg, LOCAL_VAR_ADDR(ip[2], void*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

case INTOP_CALL_HELPER_V_AGP:
{
int base = 4;
DO_GENERIC_LOOKUP(ip[2], base + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems cleaner for DO_GENERIC_LOOKUP to receive the data item index directly rather than an offset into the currently executing instruction.

return result;
}

#define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this define anymore ? Doesn't seem to add more clarity. We could just replace it with a method returning result explicitly

davidwrighton added a commit that referenced this pull request Jun 27, 2025
- use static_asserts where possible
- remove the mess of casts and temp variables around helper function opcodes
- Remove the DO_GENERIC_LOOKUP macro since it wasn't doing much of use anymore
- Use the simdhash apis a little better
@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants