-
Notifications
You must be signed in to change notification settings - Fork 0
Support dependencies using tracked paths #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
README.md
Outdated
# Copy files necessary to load package and perform the first initialization. | ||
COPY src ${JULIA_PROJECT}/src | ||
RUN julia -e 'using Pkg; name = Pkg.Types.EnvCache().project.name; Pkg.precompile(name; timing=true); Base.require(Main, Symbol(name))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that we can just use the pkg-precompile.jl
precompilation and initialization if we copy the src
beforehand. This does mean that we invalidate the precompilation layer when the package changes but that doesn't matter much as we copy the .ji
files from the cache mount and will typically only precompile the changed source files.
The only downsides of this is that the cache mount is needed for this not to impact performance. Say if we built a Docker image in CI (a different host each time) without using the mount cache then the loss of Docker layer caching could be an issue.
Additionally, the curl
call could fail if there are network issues but that should also be a rare issue and could be mitigated by always having this script installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does mean that we invalidate the precompilation layer when the package changes but that doesn't matter much as we copy the
.ji
files from the cache mount and will typically only precompile the changed source files.The only downsides of this is that the cache mount is needed for this not to impact performance. Say if we built a Docker image in CI (a different host each time) without using the mount cache then the loss of Docker layer caching could be an issue.
Yeah this is my main concern with this approach. Seems like a step backwards to give up good layer caching...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving COPY src
isn't a requirement for this change but without this we need to either make a empty package so that precompilation can pass or skip precompilation for that package. I'll take another look into those options.
At worse you can review those options and can make an informed decision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative approaches where we create empty placeholder packages for tracked dependencies or skip precompilation are overall pretty similar approaches. Taking these approaches would allow us to precompile dependencies which are not tracked and do not depend on a tracked package in pkg-precompile.jl
. We would need to perform another RUN
statement which performs Pkg.precompile
after the tracked dependencies have been added into the image. In the worst case scenario where a tracked package is used by all other dependencies the pkg-precompile.jl
script would do nothing and we'd end up fully dependent on Docker layer caching.
I do think the simplicity of the Dockerfile provided by the current solution is worth it. The current solution is only an issue when cache mounting isn't used which isn't a problem for our CI setup anymore.
I'm considering some other options such as:
- Modifying the path stored in the
.ji
file to avoid invalidation - Avoiding using
set_distinct_active_project
for tracked dependencies. Probably not feasible due to issue with.ji
reuse across multiple containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the path stored in the
.ji
file to avoid invalidation
I've gone with this approach. It's transparent for users and doesn't break Docker layer caching and allows us to keep using set_distinct_active_project
for avoiding slug collisions. It does mean having to read the .ji
binary file but luckily they do have a format version included so we should be able to abort if we encounter a new version we don't yet understand.
# # Listing all cached precompile files can be useful in debugging unexpected failures but | ||
# # it can be extremely verbose. | ||
# @debug let paths = String[] | ||
# for (root, dirs, files) in walkdir(joinpath(DEPOT_PATH[1], "compiled", "v$(VERSION.major).$(VERSION.minor)")) | ||
# for file in files | ||
# if endswith(file, ".ji") | ||
# push!(paths, joinpath(root, file)) | ||
# end | ||
# end | ||
# end | ||
# "All precompile files within the cache mount:\n$(join(paths, '\n'))" | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was useful in determining the source of the problem but is far too verbose for typical debug logs
project_hash = "3b30178eead5751cd2e80845fb6da37f5f121d23" | ||
|
||
[[deps.Demo]] | ||
path = "Demo.jl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require a Manifest.toml
so we can add a path
here but the downside is we'll get warnings on other versions of Julia which want us to re-resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that we're trying to avoid precompiling path-tracking dependencies twice (because of the absolute path dependency on 1.10)?
I kinda feel like it may not be worth it if it means giving up layer caching for dependencies/precompilation...
run into issues with erroneous cache hits across image builds
seems like this can be accomplished by (like you said) just omitting path-tracking manifest entries from precompilation which wouldn't require installing the src/ext from the entrypoint project right? or am I missing something?
README.md
Outdated
# Copy files necessary to load package and perform the first initialization. | ||
COPY src ${JULIA_PROJECT}/src | ||
RUN julia -e 'using Pkg; name = Pkg.Types.EnvCache().project.name; Pkg.precompile(name; timing=true); Base.require(Main, Symbol(name))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does mean that we invalidate the precompilation layer when the package changes but that doesn't matter much as we copy the
.ji
files from the cache mount and will typically only precompile the changed source files.The only downsides of this is that the cache mount is needed for this not to impact performance. Say if we built a Docker image in CI (a different host each time) without using the mount cache then the loss of Docker layer caching could be an issue.
Yeah this is my main concern with this approach. Seems like a step backwards to give up good layer caching...
I'm planning on getting back to this but I suspect it may be deferred until next week |
MWE example of the problem we're trying to solve: rm -rf /tmp/depot /tmp/project /tmp/project2
mkdir /tmp/depot /tmp/project
JULIA_DEPOT_PATH=/tmp/depot julia -e 'using Pkg; Pkg.Registry.add("General"); Pkg.precompile()'
JULIA_DEPOT_PATH=/tmp/depot julia --project=/tmp/project -e 'using Pkg; cd(dirname(Base.active_project())); Pkg.generate("Demo"); Pkg.develop(path="./Demo"); using Demo'
cat /tmp/project/Manifest.toml
find /tmp/depot/compiled/v1.11/Demo -name "*.ji"
mv /tmp/project /tmp/project2
JULIA_DEBUG=loading JULIA_DEPOT_PATH=/tmp/depot julia --project=/tmp/project2 -e 'using Demo' ❯ rm -rf /tmp/depot /tmp/project /tmp/project2
❯ mkdir /tmp/depot /tmp/project
❯ JULIA_DEPOT_PATH=/tmp/depot julia -e 'using Pkg; Pkg.Registry.add("General"); Pkg.precompile()'
Added `General` registry to /tmp/depot/registries
No Changes to `/private/tmp/depot/environments/v1.11/Project.toml`
No Changes to `/private/tmp/depot/environments/v1.11/Manifest.toml`
❯ JULIA_DEPOT_PATH=/tmp/depot julia --project=/tmp/project -e 'using Pkg; cd(dirname(Base.active_project())); Pkg.generate("Demo"); Pkg.develop(path="./Demo"); using Demo'
Generating project Demo:
Demo/Project.toml
Demo/src/Demo.jl
Resolving package versions...
Updating `/private/tmp/project/Project.toml`
[25faeaa9] + Demo v0.1.0 `Demo`
Updating `/private/tmp/project/Manifest.toml`
[25faeaa9] + Demo v0.1.0 `Demo`
Precompiling Demo...
1 dependency successfully precompiled in 1 seconds
❯ cat /tmp/project/Manifest.toml
# This file is machine-generated - editing it directly is not advised
julia_version = "1.11.5"
manifest_format = "2.0"
project_hash = "6ab95279a0e4da063010f90285aa5e8243990fad"
[[deps.Demo]]
path = "Demo"
uuid = "25faeaa9-fcc1-4249-8e23-c1744a6c1ff7"
version = "0.1.0"
❯ find /tmp/depot/compiled/v1.11/Demo -name "*.ji"
/tmp/depot/compiled/v1.11/Demo/lo7dN_3d9pM.ji
❯ mv /tmp/project /tmp/project2
❯ JULIA_DEBUG=loading JULIA_DEPOT_PATH=/tmp/depot julia --project=/tmp/project2 -e 'using Demo'
┌ Debug: Rejecting cache file /tmp/depot/compiled/v1.11/Demo/lo7dN_3d9pM.ji because it is for file /tmp/project/Demo/src/Demo.jl not file /tmp/project2/Demo/src/Demo.jl
└ @ Base loading.jl:3873
┌ Debug: Rejecting cache file /tmp/depot/compiled/v1.11/Demo/lo7dN_3d9pM.ji because it is for file /tmp/project/Demo/src/Demo.jl not file /tmp/project2/Demo/src/Demo.jl
└ @ Base loading.jl:3873
Precompiling Demo...
1 dependency successfully precompiled in 1 seconds
┌ Debug: Loading object cache file /tmp/depot/compiled/v1.11/Demo/lo7dN_khGfN.dylib for Demo [25faeaa9-fcc1-4249-8e23-c1744a6c1ff7]
└ @ Base loading.jl:1244 Using some custom code to re-write the
Note we didn't have to update the slug to match which means we should be able to still use |
The Julia 1.10.8 issue discovered here wasn't from these changes. I've separated that into #6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an impressive piece of work. I still hate it lol, it really feels like a bad idea to be messing around so much in undocumented and possibly unstable internals just to optimize our docker builds a little bit.
correct me if I'm wrong here, but IIUC the precompile cache that's invalidated is just for code that is tracked by path, and in our use cases that's generally just a single package.
unless and until that becomes really onerous, I'm inclined to leave this as a "well we tried to find a way to do this but everything we could come up with was so cursed that it doesn't really seem worthwhile".
I've been trying to use this script as a test bed for what we need to change in
You are correct here that it doesn't add much time to precompile the path tracked packages twice.
We do need to do something however as otherwise we run into:
I've reverted the code back to the old logic which just postpones the precompilation of path tracked packages to occur outside of the cache mount. |
I ran into a scenario where I wanted to use a tracked path dependency with
pkg-precompile.jl
and found out by usingJULIA_DEBUG=pkg-precompile,loading
that the distinct project environment interferes with tracked path dependencies:I considered some alternatives but if tracked path dependencies require the path to be unchanged and we would store these in our shared cache mount we could run into issues with erroneous cache hits across image builds. It seems best to exclude these entries from the cache entirely and precompile these everytime for now.
We can improve upon this further but it would require some changes to base Julia which considers the contents of the source code and not the path of the source code.