Skip to content

Conversation

alesj
Copy link
Contributor

@alesj alesj commented Apr 3, 2025

Lets see how far we get with CI -- hence this is not a draft ...
... and @brunobat needs this asap ...

Tests pass locally ...

  • all grpc
  • opentelemetry-grpc*
  • opentelemetry-vertx*

@gsmet apology in advance for using your resources :-)

@alesj alesj requested review from brunobat and cescoffier April 3, 2025 16:20
Copy link

quarkus-bot bot commented Apr 3, 2025

/cc @brunobat (opentelemetry), @radcortez (opentelemetry)

This comment has been minimized.

This comment has been minimized.

Copy link

github-actions bot commented Apr 3, 2025

🎊 PR Preview 745f5cf has been successfully built and deployed to https://quarkus-pr-main-47157-preview.surge.sh/version/main/guides/

  • Images of blog posts older than 3 months are not available.
  • Newsletters older than 3 months are not available.

This comment has been minimized.

@alesj
Copy link
Contributor Author

alesj commented Apr 3, 2025

Hmmm, nothing looks related to gRPC4 ootb ...
LGTM tests pass for me locally.

This comment has been minimized.

@alesj alesj force-pushed the grpc4 branch 2 times, most recently from 1033edf to 897080e Compare April 4, 2025 14:19
@brunobat
Copy link
Contributor

brunobat commented Apr 4, 2025

Will prepare a PR to upgrade OTel and the OTLP protocol.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Member

@cescoffier cescoffier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but the pulsar failures looked related.

@alesj alesj force-pushed the grpc4 branch 2 times, most recently from e2163da to a09251e Compare April 14, 2025 07:47

This comment has been minimized.

@quarkus-bot quarkus-bot bot added the area/dependencies Pull requests that update a dependency file label Apr 14, 2025
@brunobat
Copy link
Contributor

@alesj I've updated otel in the branch

This comment has been minimized.

@alesj
Copy link
Contributor Author

alesj commented Aug 17, 2025 via email

@alesj
Copy link
Contributor Author

alesj commented Aug 17, 2025 via email

@gastaldi
Copy link
Contributor

Did you add this commit / change to this branch?

It's in the PR, but I closed it as the upgrade was reverted in #49560

@cescoffier
Copy link
Member

Double generation means generating for both proto 3 and proto 4.
And do detect, we would need to index all the proto files (or maybe the stubs) to find which version they need.

@ozangunalp
Copy link
Contributor

Bumping Pulsar doesn't make a difference. It still uses proto3.

But I think grpc 4 is somewhat compatible with protobuf3, what is why jvm tests and native with incomplete classpath passes.

@alesj
Copy link
Contributor Author

alesj commented Aug 18, 2025

But I think grpc 4 is somewhat compatible with protobuf3, what is why jvm tests and native with incomplete classpath passes.

Exactly ... but since native is more eager checking the classpath, it fails since we're missing that v4 RuntimeVersion class.
If we ignore it, it works -- as the health check stuff is not used in that case.

@alesj
Copy link
Contributor Author

alesj commented Aug 18, 2025

And do detect, we would need to index all the proto files (or maybe the stubs) to find which version they need.

The problem is that the version in the proto file doesn't mean much, what matters most is what is on the classpath.
e.g. even if you have syntax = "proto2"; in the file -- as it's the case of this Pulsar test(s) -- we still get v4 generated file

@ozangunalp
Copy link
Contributor

The problem is that the version in the proto file doesn't mean much, what matters most is what is on the classpath.

Which makes sense.

Either we maintain two versions of this extension for v3 and v4, or we have guidelines for applications to continue using the grpc v3 correctly.

@alesj
Copy link
Contributor Author

alesj commented Aug 18, 2025

or we have guidelines for applications to continue using the grpc v3 correctly.

What's correctly in this case? :-)

We need v4 for the OpenTelemetry upgrade -- otherwise we're stuck on some old version forever.

@cescoffier
Copy link
Member

The major issue with the incomplete classpath option is that it's for everything. Basically, you are breaking the Quarkus close-world assumption. So we need to see what can be done, but an incomplete classpath should not be the preferred solution.

@alesj
Copy link
Contributor Author

alesj commented Aug 18, 2025

I just doubt it's worth spending so much time for this Pulsar gRPC v3 native issue.
We can just say it doesn't work in native?
Or is there an even such GraalVM flag, that does that "incomplete" path ignore -- which users could use if really needed?

For the tests purpose to work, we could just add that class ourselves ...

@cescoffier
Copy link
Member

That would be quite a regression and this is just one example - there are other dependencies out there that will have the same issue.

@alesj
Copy link
Contributor Author

alesj commented Aug 19, 2025

there are other dependencies out there that will have the same issue

Exactly.
And how are we gonna catch and fix them all?
Since this will just be a bunch of workarounds for v4 vs v3 ...

@brunobat
Copy link
Contributor

brunobat commented Sep 11, 2025

@alesj @cescoffier @ozangunalp We really need to use gRPC v4.
This is blocking OTel upgrades, The upgrade to MP Telemetry 2.1 and The SR Reactive Messaging PR smallrye/smallrye-reactive-messaging#3074 needed by @jasondlee.
CC @radcortez

PS: Please note this PR was later reverted.

@cescoffier
Copy link
Member

Until we have ITs showing that we can have dependencies including gRPC v3 and gRPC v4 stubs side by side, we cannot really move forward.

@cescoffier
Copy link
Member

I mean ITs with:

  • gRPC v4 stubs in a dependency
  • gRPC v3 stubs in a dependency
  • gRPC v3 in a dependency and gRPC v4 stubs in another dependency

@alesj
Copy link
Contributor Author

alesj commented Sep 23, 2025

Uh, we might still be screw... (I thought this worked for me ... but was probably some weird config that made it work when I tested it "by hand").

This comes from my grpc4_v2 branch

This test requires 3 modules:
a) proto v3 - for Pulsar
b) proto v4 - for gRPC services
c) tests - which depends on (a) and (b)

And why the failure (afais)?

The Pulsar ReflectData code does this

      Type[] params = ptype.getActualTypeArguments();
      if (Map.class.isAssignableFrom(raw)) { // Map
        Class key = (Class) params[0];

since old GeneratedMessageV3 looked like this

public abstract class GeneratedMessageV3 extends AbstractMessage implements Serializable {

hence it was a plain class (aka Class), but "new" GeneratedMessageV3 (from proto4) looks like

public abstract class GeneratedMessageV3
    extends GeneratedMessage.ExtendableMessage<GeneratedMessageV3> {

so it's a parameterized class (not a Class, but TypeVariableImpl).

@cescoffier @ozangunalp any other idea ... ?

@alesj
Copy link
Contributor Author

alesj commented Sep 23, 2025

Hmmm, let me try substituting this ReflectData::createSchema with something that would understand both GeneratedMessageV3 versions ...

@ozangunalp
Copy link
Contributor

ozangunalp commented Sep 24, 2025

@alesj I will look at this asap.

@cescoffier
Copy link
Member

@alesj Is it Pulsar code (hand written) or in the code generated by protoc embedding in the Pulsar jar?

@alesj
Copy link
Contributor Author

alesj commented Sep 24, 2025

The problem is actually in Avro code (doh!) -- see ReflectData class, better ReflectData::createSchema.

This part of code assumes that the Map's type arguments are plain classes, not parameterized ... which is a bold assumption, so wondering how this ever worked ...

    } else if (type instanceof ParameterizedType) {
      ParameterizedType ptype = (ParameterizedType) type;
      Class raw = (Class) ptype.getRawType();
      Type[] params = ptype.getActualTypeArguments();
      if (Map.class.isAssignableFrom(raw)) { // Map
        Class key = (Class) params[0]; // HERE !!!

I'm now playing with ASM to try and fix this ...

@cescoffier
Copy link
Member

Do you mean Pulsar is using Avro classes that do not support gRPC 4?
I think it's an acceptable limitation in this case.

(what is problematic is not supporting gRPC 3 and 4 together, but implementation details outside of that are out of scope)

@alesj
Copy link
Contributor Author

alesj commented Sep 24, 2025

Hmm, it's actually not the new GeneratedMessageV3 fault, but something inside it:
... recursively going back into createSchema ...

	  at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:632)
	  at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:882)
	  at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:733)
	  at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:517)
	  at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:777)
	  at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:882)
	  at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:733)
	  at org.apache.avro.reflect.ReflectData.createNonStringMapSchema(ReflectData.java:543)
	  at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:638)

@alesj
Copy link
Contributor Author

alesj commented Sep 24, 2025

For some reason this extensions type is not recognized as FieldDescriptor ...

  public abstract static class ExtendableMessage<MessageT extends ExtendableMessage<MessageT>>
      extends GeneratedMessage implements ExtendableMessageOrBuilder<MessageT> {

    private static final long serialVersionUID = 1L;

    private final FieldSet<FieldDescriptor> extensions;

----

final class FieldSet<T extends FieldSet.FieldDescriptorLite<T>> {

  private final SmallSortedMap<T, Object> fields; // CAUSE !!

... it goes recursively in, but at some point this becomes just a T ...

@alesj
Copy link
Contributor Author

alesj commented Sep 24, 2025

Well, w/o the type this will never work ... so the question is why does that FieldDescriptor get erased?
Since this is when you get when there is no type ...

at io.quarkus.grpc.deployment.ReflectDataClassVisitor.main(ReflectDataClassVisitor.java:199)
Caused by: org.apache.avro.AvroTypeException: Unknown type: T
	at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:539)
	at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:777)
	at org.apache.avro.reflect.ReflectData.createNonStringMapSchema(ReflectData.java:543)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

From:

    private static <T> T type() {
        return null;
    }

        public static void main(String[] args) throws Exception {
            Method m = ReflectDataClassVisitor.class.getDeclaredMethod("type");
            Type type = m.getGenericReturnType();

            ReflectData data = new ReflectData();

            Method method = ReflectData.class.getDeclaredMethod("createNonStringMapSchema", Type.class, Type.class, Map.class);
            method.setAccessible(true);

            method.invoke(data, type, String.class, new HashMap<>());
        }

@ozangunalp
Copy link
Contributor

@alesj, I at last had some time to take a look at this. Take a look: main...ozangunalp:quarkus:grpc4_v2

First issue I've found is, for Pulsar channels the schema wasn't detected correctly. Because the sub-module did not include jandex indexes, the pulsar extension did not find the class in the index and do the schema detection, hence fallback to JSONSchema.

Here, there is something that caught me off guard: With the beans.xml files available in sub-modules, the combined index during normal build included classes coming from those modules. BUT the native build augmentation did not include those classes, unless I add the quarkus.index-dependency properties. This was unexpected...

Second was a native compilation issue. I had to put the grpc4 class JavaFeaturesProto as runtime init.

Third, there was an issue with the GreeterTestIT grpc test.

I also reverted the class visitor you've added.

@cescoffier, we may need more tests, but this does not look too bad. Meaning classes generated with proto3 and proto4 can actually co-exist.

@cescoffier
Copy link
Member

Awesome @ozangunalp

@ozangunalp ozangunalp mentioned this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Pull requests that update a dependency file area/grpc gRPC area/tracing triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants