Emit all datasets recalc in one consumer #2425

lampajr · 2025-06-25T12:37:46Z

I am afraid that sending dataset event in different TX callbacks we are sending events completely asynchronously and when there is a huge amount of them we might risk to hit some race condition that could cause wrong credit calculation on amqp client:

2025-06-25 10:59:45,580 645a1466d4ef /usr/lib/jvm/java-17-openjdk-17.0.15.0.6-2.el9.x86_64/bin/java[7] ERROR [io.sma.rea.mes.amqp] (vert.x-eventloop-thread-1) SRMSG16225: Failure reported for channel `dataset-event-out`, closing client: java.lang.IllegalArgumentException: Invalid request number, must be greater than 0
	at io.smallrye.mutiny.helpers.Subscriptions.getInvalidRequestException(Subscriptions.java:24)
	at io.smallrye.mutiny.operators.multi.MultiOperatorProcessor.request(MultiOperatorProcessor.java:117)
	at io.smallrye.mutiny.operators.multi.MultiOnRequestInvoke$MultiOnRequestInvokeOperator.request(MultiOnRequestInvoke.java:34)
	at io.smallrye.reactive.messaging.amqp.AmqpCreditBasedSender.setCreditsAndRequest(AmqpCreditBasedSender.java:163)
	at io.smallrye.reactive.messaging.amqp.AmqpCreditBasedSender.lambda$onNoMoreCredit$7(AmqpCreditBasedSender.java:218)
	at io.vertx.mutiny.core.Context.lambda$runOnContext$1(Context.java:136)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:270)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:252)
	at io.vertx.core.impl.ContextInternal.lambda$runOnContext$0(ContextInternal.java:50)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

This is just a guess as I don't know how to prove it 🫤

Changes proposed

Instead of adding one TX callback for each dataset to change, I am proposing to add a single TX callback that will send all those events synchronously.

This way the messages are sent in order, and sequentially so that there shouldn't be any race condition anymore.

Check List (Check all the applicable boxes)

My code follows the code style of this project.
My change requires changes to the documentation.
I have updated the documentation accordingly.
All new and existing tests passed.

Signed-off-by: Andrea Lamparelli <[email protected]>

lampajr · 2025-06-25T12:39:49Z

Is there anything I am missing and for which the previous implementation is needed?

johnaohara · 2025-06-25T13:10:23Z

I think it makes sense to move emitting all events into one TX callback, i don't know how the callback mechanism scales.

I am not sure if it will resolve the error you are seeing though, that appears related to unavailable credits, and the number of messages pushed through the AMQ client will be the same

lampajr · 2025-06-25T13:14:21Z

Thanks @johnaohara

I am not sure if it will resolve the error you are seeing though, that appears related to unavailable credits, and the number of messages pushed through the AMQ client will be the same

I agree, but the error seems related to some wrong calculation Invalid request number, must be greater than 0 that shouldn't happen even if I send more messages than credits, in that case it should just wait (if I understood correctly).

That's why I am thinking that the wrong calculation is caused by some race condition, but I cannot confirm so..

johnaohara · 2025-06-25T13:27:09Z

I agree, but the error seems related to some wrong calculation Invalid request number, must be greater than 0 that shouldn't happen even if I send more messages than credits, in that case it should just wait (if I understood correctly).

That's why I am thinking that the wrong calculation is caused by some race condition, but I cannot confirm so..

This may change the timing, and therefore not trigger the race cond. but if there is a race it looks like it is in smallrye reactive messaging and would still be present.

I agree that this should apply back pressure on the client to slow down emitting messages, maybe the smallrye reactive messaging team should take a look?

stalep · 2025-06-25T13:28:22Z

With emitting all events in on the same TX would there be a potential issue with timeout?

+1 on looping in someone with some messaging background here :)

johnaohara · 2025-06-25T13:55:51Z

With emitting all events in on the same TX would there be a potential issue with timeout?

This change is not changing the TX semantics, but is registering one callback instead of n callbacks when the tx is committed (or rolled back). The callback occurs after the tx has succeeded/failed

lampajr · 2025-06-25T14:10:40Z

This change is not changing the TX semantics, but is registering one callback instead of n callbacks when the tx is committed (or rolled back). The callback occurs after the tx has succeeded/failed

yeah exactly

+1 on looping in someone with some messaging background here :)

anyone you know of? otherwise I can open an issue on quarkus directly, wdyt?

stalep · 2025-06-25T16:05:11Z

perhaps @gemmellr or @ozangunalp?

lampajr · 2025-07-07T16:50:47Z

Meanwhile, @stalep @johnaohara wdyt about this fix? Is it good to go as it is?

willr3

I think the change looks good but I am not familiar with this part of Horreum nor do I believe we have adequate testing to shore up my confidence. In which case I say let's merge it and see what happens :)

johnaohara

lgtm

Emit all datasets recalc in one consumer

0254f3b

Signed-off-by: Andrea Lamparelli <[email protected]>

lampajr requested review from johnaohara, stalep and willr3 June 25, 2025 12:39

lampajr marked this pull request as ready for review June 25, 2025 12:39

lampajr added the backport label Jun 25, 2025

lampajr self-assigned this Jun 25, 2025

lampajr mentioned this pull request Jun 26, 2025

Invalid request number, must be greater than 0 when sending message to outbound channel quarkusio/quarkus#48624

Open

willr3 reviewed Jul 14, 2025

View reviewed changes

willr3 approved these changes Jul 14, 2025

View reviewed changes

johnaohara approved these changes Jul 14, 2025

View reviewed changes

lampajr merged commit e51f1e3 into Hyperfoil:master Jul 14, 2025
4 checks passed

rh-appservices-perf mentioned this pull request Jul 14, 2025

[0.18] Emit all datasets recalc in one consumer #2441

Merged

4 tasks

lampajr deleted the change_send_dataset_label_recal branch July 14, 2025 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Emit all datasets recalc in one consumer #2425

Emit all datasets recalc in one consumer #2425

Uh oh!

lampajr commented Jun 25, 2025 •

edited

Loading

Uh oh!

lampajr commented Jun 25, 2025 •

edited

Loading

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

lampajr commented Jun 25, 2025

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

stalep commented Jun 25, 2025 •

edited

Loading

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

lampajr commented Jun 25, 2025 •

edited

Loading

Uh oh!

stalep commented Jun 25, 2025

Uh oh!

lampajr commented Jul 7, 2025

Uh oh!

willr3 left a comment

Uh oh!

johnaohara left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Emit all datasets recalc in one consumer #2425

Emit all datasets recalc in one consumer #2425

Uh oh!

Conversation

lampajr commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed

Check List (Check all the applicable boxes)

Uh oh!

lampajr commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

lampajr commented Jun 25, 2025

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

stalep commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnaohara commented Jun 25, 2025

Uh oh!

lampajr commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stalep commented Jun 25, 2025

Uh oh!

lampajr commented Jul 7, 2025

Uh oh!

willr3 left a comment

Choose a reason for hiding this comment

Uh oh!

johnaohara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lampajr commented Jun 25, 2025 •

edited

Loading

lampajr commented Jun 25, 2025 •

edited

Loading

stalep commented Jun 25, 2025 •

edited

Loading

lampajr commented Jun 25, 2025 •

edited

Loading