Skip to content

Conversation

gsmet
Copy link
Member

@gsmet gsmet commented Aug 18, 2025

Note

This is part of my efforts on large monoliths but it should benefit all applications in the end.

This is a big patch, sorry, but I tried to have semantic commits, even if the first commit is quite large by itself.

My goal was to only implement parallel compression of jars using Commons Compress... but I ended up being unable to do it with JarResultBuildStep in its current state: the class was too big (1600+ lines) and it was too hard to actually comprehend the specificities of each format.

Thus why my first step was to rewrite this class with a proper hierarchy and each format splitted. FWIW, it's not the first time I struggled to adjust things there so my personal opinion is that this rewrite was long overdue.

Then I extracted the creation of the archive to an interface and finally implemented parallel compression using Commons Compress.

The last commit is just some final cleanup that was too entangled to actually be squashed in one of the existing commits.

For my test app with 37k Java source classes (+ all the ones we generate), it went from 6 seconds to 1.6 seconds, so a 3.75x speedup.

For now I'm still using the ZipFileSystem approach for Uberjars. Uberjars are specific as the Manifest is potentially updated at the end to include the multi release bits and it comes with some subtleties as the Manifest has to be added first to the jar, nothing insurmountable but this work already consumes a lot of bandwith. I could be convinced to put the extra work and get rid of it at some point. Not in this PR though.

@quarkus-bot quarkus-bot bot added area/core area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/gradle Gradle area/kubernetes area/maven labels Aug 18, 2025
Comment on lines -218 to -219
<!-- let's avoid having commons-io crawling into quarkus-core -->
<exclude>commons-io:commons-io</exclude>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ban it through ForbiddenAPIs instead but we have some unrelated code depending on it so I need to clean up this code before doing it.

Will open a follow-up PR once this one is in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is fine, we already have a local forbidden-apis rule for core/deployment banning usage of Commons IO.

We are using it in a lot of test utils everywhere so it's hard to fully get rid of it. We could get rid of most of them but the IOUtils dependencies to get the content of a URL are handy.

};
}

private static ExecutorService initExecutorService() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels quite wasteful to create a whole new Executor for each single jar?
What about reusing the build executor, and wrapping it into an adaptor wich would ignore the shutdown request issues by commons-compress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that you are plain right. I will create only one. BUT I prefer avoiding reusing the build executor as really I don't want to make assumptions as to what Commons Compress is doing with the executor.

I know it's a bit suboptimal but I think it's safer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look and I think I will keep it as it is even if suboptimal. The code in Commons Compress has several comments saying they absolutely want the executor to be shut down.

I know it's probably being a bit too safe but I really prefer not breaking their contract.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that's a lot of threads? And they wouldn't coordinate well with other jobs running on the existing executors - it worries me that it's not "a little suboptimal", might lead to serious problems like hard to reproduce issues, unmanaged spikes of memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raaah, I hesitate...

Copy link
Member

@Sanne Sanne Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has several comments saying they absolutely want the executor to be shut down.

I know it's probably being a bit too safe but I really prefer not breaking their contract.

Ok, I see - they probably do some dodgy things then - I guess you're right in being cautious. What about at least reusing the ParallelScatterZipCreator across your various needs, would that be an option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ended up doing it... and the result is underwhelming. It's quite slower.

I thought it could be due to the fact that we end up starting too many threads so I tried your patch here: #49575 .

And basically:

  • your patch there makes the global build faster at 2 * cores
  • but the specific jar compression is faster at 4 * cores (but the global build is slower even with the compression being faster)

So I think we are looking at something that specifically requires its own thread pool with specific configuration.

I'm going to try with your patch + executor decoupling but only one Executor Service for all the jars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mkay, I can't actually reproduce my numbers...

I'll push the patch reusing the common thread pool...

@Sanne
Copy link
Member

Sanne commented Aug 18, 2025

It's interesting that you opted to parallelize the writing of a single jar - wasn't there also some opportunities to write different jars in parallel? Would that be a different optimisation that we could do as well ?

But all such tweaks would benefit the most from actually sharing executors.

@gsmet
Copy link
Member Author

gsmet commented Aug 18, 2025

It's interesting that you opted to parallelize the writing of a single jar - wasn't there also some opportunities to write different jars in parallel? Would that be a different optimisation that we could do as well ?

We could at some point, but I needed to optimize the writing of a single jar anyway as not all jars are created equals. Some of them are very large, some of them very small.

I'm not convinced adding parallelization of the whole jar operation after this work would bring some benefits.

Feel free to have fun with it though :).

@Sanne
Copy link
Member

Sanne commented Aug 18, 2025

Feel free to have fun with it though :)

It doesn't feel like a priority now that you went with the nuclear approach :) But I do wonder how efficient "parallel compression" in commons-compress is, compared to a "simple" zipstream for each jar.

This comment has been minimized.

@gsmet
Copy link
Member Author

gsmet commented Aug 18, 2025

FWIW, the failures are due to some Maven tests using very old versions of Commons IO as a dependency, and not relying on our BOM.
I'm fixing the tests... but each rerun of the Maven tests takes ages...

As for Gradle, the problem was that a test tests the exact list of files in build/ and we had a directory from the compression still lurking around.
I added a line to drop it at the end.

@gsmet
Copy link
Member Author

gsmet commented Aug 18, 2025

It doesn't feel like a priority now that you went with the nuclear approach :) But I do wonder how efficient "parallel compression" in commons-compress is, compared to a "simple" zipstream for each jar.

In some cases, we build only one jar (native image and legacy thin jar but only the first one is relevant this day). So just parallelizing the build of each jar won't help.

In the case where we build the most jars, it's like 3. One of the three quarkus-run.jar is tiny. The two others (transformed and generated), it depends but in the large monolith case, one is double the size of the other.

So just parallelizing building each jar won't bring you much.

@gsmet
Copy link
Member Author

gsmet commented Aug 18, 2025

OK, it should be fine now.

This comment has been minimized.

@gsmet
Copy link
Member Author

gsmet commented Aug 19, 2025

@geoand this is another for you when you're back from PTO. Sorry :).

gsmet added 6 commits August 24, 2025 11:46
This class had become completely unmanageable due to its size.
Given I'm willing to invest some time to see if we can improve our zip
build time, this is a necessary step in preparation to the upcoming
improvements.
Including the native image source jar.
It is based on commons-compress.

I haven't implemented parallel jar build for uberjars for now. This is a
known limitation.
I would have squashed it but it's not easy to squash into initial commit
due to conflicts.
Copy link

quarkus-bot bot commented Aug 24, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit cfc8d90.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.


Flaky tests - Develocity

⚙️ Gradle Tests - JDK 17

📦 integration-tests/gradle

io.quarkus.gradle.BuildConfigurationTest.buildNoOverride - History

  • Multiple Failures (1 failure) -- failure 1 -- [sub project 'without-configuration', package type 'fast-jar'] Expecting path: - org.assertj.core.error.AssertJMultipleFailuresError
org.assertj.core.error.AssertJMultipleFailuresError: 

Multiple Failures (1 failure)
-- failure 1 --
[sub project 'without-configuration', package type 'fast-jar'] 
Expecting path:
  /home/runner/_work/quarkus/quarkus/integration-tests/gradle/target/classes/build-configuration/without-configuration/build/quarkus-app/quarkus-run.jar
to exist (symbolic links were followed).

Copy link
Contributor

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

Let's get it in so we can avoid conflicts

@gsmet gsmet merged commit 83438f8 into quarkusio:main Aug 25, 2025
57 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.28 - main milestone Aug 25, 2025
@geoand
Copy link
Contributor

geoand commented Aug 27, 2025

@gsmet FYI, this looks like it caused the following issue in Quarkus LangChain4j

2025-08-27T03:51:19.3505531Z [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.14.0:compile (default-compile) on project quarkus-langchain4j-jlama-deployment: Compilation failure
2025-08-27T03:51:19.3508229Z [ERROR] /home/runner/work/quarkus-langchain4j/quarkus-langchain4j/current-repo/model-providers/jlama/deployment/src/main/java/io/quarkiverse/langchain4j/jlama/deployment/JlamaProcessor.java:[269,32] cannot find symbol
2025-08-27T03:51:19.3510005Z [ERROR]   symbol:   variable QUARKUS_RUN_JAR
2025-08-27T03:51:19.3510703Z [ERROR]   location: class io.quarkus.deployment.pkg.steps.JarResultBuildStep
2025-08-27T03:51:19.3511384Z [ERROR] -> [Help 1]
2025-08-27T03:51:19.3513012Z org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.14.0:compile (default-compile) on project quarkus-langchain4j-jlama-deployment: Compilation failure
2025-08-27T03:51:19.3526289Z /home/runner/work/quarkus-langchain4j/quarkus-langchain4j/current-repo/model-providers/jlama/deployment/src/main/java/io/quarkiverse/langchain4j/jlama/deployment/JlamaProcessor.java:[269,32] cannot find symbol
2025-08-27T03:51:19.3528038Z   symbol:   variable QUARKUS_RUN_JAR
2025-08-27T03:51:19.3528644Z   location: class io.quarkus.deployment.pkg.steps.JarResultBuildStep
2025-08-27T03:51:19.3529140Z 
2025-08-27T03:51:19.3529596Z     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:333)
2025-08-27T03:51:19.3530640Z     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
2025-08-27T03:51:19.3531872Z     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
2025-08-27T03:51:19.3532878Z     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
2025-08-27T03:51:19.3533850Z     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
2025-08-27T03:51:19.3534806Z     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
2025-08-27T03:51:19.3537251Z     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
2025-08-27T03:51:19.3538633Z     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
2025-08-27T03:51:19.3539837Z     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
2025-08-27T03:51:19.3541222Z     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
2025-08-27T03:51:19.3542697Z     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
2025-08-27T03:51:19.3543993Z     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
2025-08-27T03:51:19.3544890Z     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
2025-08-27T03:51:19.3545652Z     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
2025-08-27T03:51:19.3546594Z     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
2025-08-27T03:51:19.3547315Z     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
2025-08-27T03:51:19.3548001Z     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
2025-08-27T03:51:19.3548670Z     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
2025-08-27T03:51:19.3549621Z     at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:103)
2025-08-27T03:51:19.3550566Z     at java.lang.reflect.Method.invoke (Method.java:580)
2025-08-27T03:51:19.3551415Z     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:255)
2025-08-27T03:51:19.3552433Z     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:201)
2025-08-27T03:51:19.3553689Z     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:361)
2025-08-27T03:51:19.3554823Z     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:314)
2025-08-27T03:51:19.3555857Z Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure
2025-08-27T03:51:19.3558019Z /home/runner/work/quarkus-langchain4j/quarkus-langchain4j/current-repo/model-providers/jlama/deployment/src/main/java/io/quarkiverse/langchain4j/jlama/deployment/JlamaProcessor.java:[269,32] cannot find symbol
2025-08-27T03:51:19.3559649Z   symbol:   variable QUARKUS_RUN_JAR
2025-08-27T03:51:19.3560251Z   location: class io.quarkus.deployment.pkg.steps.JarResultBuildStep

Not a problem, just wanted to raise awareness that there might be other extensions that could fail as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/gradle Gradle area/kubernetes area/maven triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants