build: Optimize Dockerfile #3606

theProf · 2025-03-21T03:41:06Z

Note

This PR has been moved to #3919

Description

Refactor the Dockerfile for readability, while optimizing for a slimmer final image by reducing the layers and stripping the build dependencies from the production image

Background

I wanted to start contributing to issues, but got side tracked during setup. This is a work in progress as it will require more testing. The image itself relies on external dependencies not included in the Dockerfile (looking at the CI pipelines) and since I haven't got the dev environment up and running yet, unsure which dependencies are solely for building and which are required for production, but err-ed on the side of caution rather than optimization.

Comparison of sizes

Repository	Arch	Size
theProf:build/docker-optimization	linux/arm64	784MB
theProf:build/docker-optimization	linux/amd64	157MB
TandoorRecipes:feature/vue3	linux/arm64	2.83GB
TandoorRecipes:feature/vue3	linux/amd64	237MB
theProf:build/docker-optimization	linux/arm64,linux/amd64	1.56GB
TandoorRecipes:feature/vue3	linux/arm64,linux/amd64	3.78GB
ghcr.io/tandoorrecipes/recipes	linux/amd64	3.36GB

Questions

I didn't see anywhere that was tagged with contribution / discussion around DevOps, CI/CD, building, etc, but if I missed it please feel free to point me in the correct direction!

EDIT: Created an Issue for larger discussion that can be found here

CLAassistant · 2025-03-21T03:41:13Z

All committers have signed the CLA.

theProf · 2025-03-21T14:03:35Z

Dockerfile


-#This port will be used by gunicorn.
-EXPOSE 8080
+ENV DOCKER=true


Where is this env used, and for what purpose? didn't find it searching the source

vabene1111 · 2025-03-21T16:42:41Z

hi, thanks, i would like to optizmize the build process. To be honest I don't really have dev and production dependencies well sorted, I just add them to docker whenever something fails to build and the mess has been growing for years.

The vue-3 branch ist the one where all development is happening right now and its probably best to optimize that one from the beginning on so testing can happen gradually while the alpha tests start and nothing on the existing pipeline breaks.

One issue I have had for years is building for the different architectures. With the vue-3 branch I dropped support for arm/v7 because it was a pain to maintain and would just break every few month. Still there are architecture related issues for other arm build as well which sometimes cause the x86 build, which is the most important one, to fail. Properly separating these two would be great but I think it might require different tags on docker hub, not sure.

Feel free to play around and ask if you have any questions, would be great if someone could sort this mess.

theProf · 2025-03-21T21:29:48Z

To be honest I don't really have dev and production dependencies well sorted, I just add them to docker whenever something fails to build and the mess has been growing for years.

no worries, understood. I'll help get them sorted

The vue-3 branch ist the one where all development is happening right now and its probably best to optimize that one from the beginning

Glad I picked the right branch to base from!

so testing can happen gradually while the alpha tests start and nothing on the existing pipeline breaks

Where is the best place to have discussions around the existing pipelines and how they are used? I'll have some general questions on how/where things are used as I get up to speed and wouldn't want to pollute this PR thread is something like an Issue or Discussion with a label would be better for more QA/advice/back and forth?

One issue I have had for years is building for the different architectures

It can be tough! Especially if you have any dependencies on libc, then you start getting into glibc (common default of Ubuntu for example) vs musl libc (default of Alpine). I'd be happy to help sort out the multi-arch building.

One question I did have is: what is the process for getting the development environment setup from scratch?

I wasn't sure if everything is supposed to be in the Dockerfile after building/running, or if the Dockerfile is just the backend and vue3 needs to be started separately, or if TandoorRecipes/open-tandoor-data needed to be pulled in as well, etc.

If you have any sort of "minimum commands to run" to get started, it would be greatly appreciated.

theProf · 2025-03-22T02:32:02Z

Dockerfile

@@ -1,44 +1,94 @@
-FROM python:3.13-alpine3.21
+FROM node:20.19-alpine AS node-deps


what version(s) of node should this be built with?

i dont really care, I usually use some LTS version for years and once it runs out upgrade. I think I have tried everything with 22 on some machine so that should work

theProf · 2025-03-22T02:33:24Z

Dockerfile

-ENV PYTHONUNBUFFERED 1
+# TODO: use --frozen-lockfile
+# RUN yarn install --frozen-lockfile
+RUN yarn install


Adding back --frozen-lockfile will be a requirement before merge, but uncommenting for now while cleaning up and moving forward

theProf · 2025-03-22T02:36:45Z

vue3/package.json

@@ -35,5 +35,6 @@
    "vite": "^5.4.7",
    "vite-plugin-vuetify": "^2.0.4",
    "vue-tsc": "^2.0.26"
-  }
+  },
+  "packageManager": "[email protected]"


what version of yarn do you want to support? it feels like this should be a monorepo, since vue3 is referencing vue, which would require upwards of yarn v1 (current latest at [email protected]). even if not wanting to use yarn workspaces, upgrading from v1 would give more reliability/performance surrounding node modules

sure, feel free to upgrade. As with node I have never tought about this version. The vue folder will disappear at some point, just left it in because the old stuff is not yet cleaned out.

vabene1111 · 2025-03-22T10:11:22Z

One question I did have is: what is the process for getting the development environment setup from scratch?

clone the repo, install yarn/pip requirements, start the yarn dev server and start the django dev server, run migrations. That should be all thats needed to get a hot reloading dev setup to work.

discussion

feel free to open an issue, I get notified so I will respond there. Alternatively I could offer a quick discord call to explain stuff.

Generally speaking the whole system is just "what worked" for the last years, I have tried a few times to optimize for different things but overall there isn't much reason for any particular decision besides that it worked, so feel free to change whatever you want, it just needs to work :)

theProf · 2025-03-22T21:38:13Z

Alternatively I could offer a quick discord call to explain stuff

That is a very kind offer, I will happily take you up on this!

feel free to open an issue

Also did so here (and added link in the PR description), but a realtime chat may be more efficient upfront

wilmardo

Great work @theProf <3

Did a quick review with some other newish Docker syntax, nice use of the new --link already!

Feel free to ping for any questions or a final review when you are done. I now just did a quick review in the Github interface no local checkout.

It would IMO also be nice to setup an unprivilged user and switching to that. Quick copy paste to setup a 1000:1000 user and group without useradd or adduser (and remembering the different syntaxes hehe):

FROM alpine as builder
RUN <<PASSWD cat > /etc_passwd && <<GROUP cat > /etc_group
app:x:1000:1000:app:/:
PASSWD
app:x:1000:app
GROUP

FROM alpine

COPY --from=builder /etc_group /etc/group
COPY --from=builder /etc_passwd /etc/passwd

USER app

wilmardo · 2025-04-08T16:01:04Z

Dockerfile

+ENV PYTHONUNBUFFERED=1
+
+# Install all dependencies.
+RUN apk add --no-cache \


Suggested change

RUN apk add --no-cache \

RUN --mount=type=cache,target=/etc/apk/cache \

apk add \

make use of the 'newish' run volumes to cache the packages inbetween builds. Mostly an easy speedup for local dev.

wilmardo · 2025-04-08T16:04:29Z

Dockerfile

+
+COPY --link requirements.txt ./
+
+RUN <<EOF


Suggested change

RUN <<EOF

RUN --mount=type=cache,target=/etc/apk/cache --mount=type=cache,target=/root/.cache <<EOF

/root/.cache is where the pip cache lives

wilmardo · 2025-04-08T16:04:40Z

Dockerfile

+  # remove Development dependencies from requirements.txt
+  sed -i '/# Development/,$d' requirements.txt
+
+  apk add --no-cache --virtual .build-deps \


Suggested change

apk add --no-cache --virtual .build-deps \

apk add --virtual .build-deps \

related to the caching in the RUN command (proposed change)

wilmardo · 2025-04-08T16:04:55Z

Dockerfile

+    curl https://sh.rustup.rs -sSf | sh -s -- -y
+  fi
+
+  venv/bin/pip install -r requirements.txt --no-cache-dir


Suggested change

venv/bin/pip install -r requirements.txt --no-cache-dir

venv/bin/pip install -r requirements.txt

related to the caching in the RUN command (proposed change)

wilmardo · 2025-04-08T16:07:25Z

Dockerfile

+RUN <<EOF
+  /opt/recipes/venv/bin/python version.py
+  # delete git repositories to reduce image size
+  find . -type d -name ".git" | xargs rm -rf


I would add .git in the .dockerignore file instead of removing it. This is kinda meh since this only removes it from the final image the .git is already in the COPY layer which would be avoided when using the dockerignore.

I agree. Removing files added in an earlier layer doesn't actually remove them from the image. The files are still there even if they aren't visible, so it doesn't reduce the container image's size.

wilmardo · 2025-04-08T16:11:12Z

Dockerfile

+  venv/bin/pip install setuptools_rust==1.10.2
+
+  if [ $(apk --print-arch) = "aarch64" ]; then
+    curl https://sh.rustup.rs -sSf | sh -s -- -y


Couldn't this be replaced by a simple apk add rust?
The package is available:
https://pkgs.alpinelinux.org/packages?name=rust&branch=v3.21&repo=&arch=aarch64&origin=&flagged=&maintainer=

Then it is nicely within the virtual-deps as well.

theProf · 2025-04-08T17:07:27Z

Great work @theProf <3

Did a quick review with some other newish Docker syntax, nice use of the new --link already!

Feel free to ping for any questions or a final review when you are done. I now just did a quick review in the Github interface no local checkout.

It would IMO also be nice to setup an unprivilged user and switching to that. Quick copy paste to setup a 1000:1000 user and group without useradd or adduser (and remembering the different syntaxes hehe):
FROM alpine as builder
RUN <<PASSWD cat > /etc_passwd && <<GROUP cat > /etc_group
app:x:1000:1000:app:/:
PASSWD
app:x:1000:app
GROUP

FROM alpine

COPY --from=builder /etc_group /etc/group
COPY --from=builder /etc_passwd /etc/passwd

USER app

Thanks for the review! I'm glad I kept this as a Draft PR as I feel it best to split it. I've realized there are 2 goals and want to try my best not to mix them (as it will end up being a large and difficult to review PR):

Update and organize the Dockerfile to build a production ready image with clarity on dependencies and build steps
Optimize the build

I mention as I'm 100% on board for things like RUN mount caches (have it in a local branch), but that falls under 2 and will have impacts to the build pipeline as well (to actually utilize the cache). The reason I want to focus on 1 is to gain clarity on things like: https://github.com/TandoorRecipes/recipes/pull/3606/files#r2033544477

I would add .git in the .dockerignore file instead of removing it

I thought the same! I then found out there may be git submodules used by the pipelines such as here. And that's the kind of thing I'd like to move into the Dockerfile as a source of truth on all dependencies and build steps.

Once the Dockerfile is organized and a reliable source of truth, it will be much easier to add optimizations with confidence.

wilmardo · 2025-04-09T10:01:01Z

I'm glad I kept this as a Draft PR as I feel it best to split it.

Yes I think that is a good call, might still be nice to keep the RUN --mount stuff part of the Dockerfile updates, although the pipeline isn't using it fully (since ephemeral) it won't harm and would increase local compile speeds significantly.

Then the Dockerfile is a given and the second PR could focus fully on optimizing the pipelines. That would also give a nice insight in the performance gain.

Update and organize the Dockerfile to build a production ready image with clarity on dependencies and build steps

IMO the non root running of the image would be part of this. At this moment you can't run this image on Kubernetes where Pod Security is enforced.
Let me know if you need any contribution in that regard, kinda my dayjob haha.

I thought the same! I then found out there may be git submodules used by the pipelines such as here. And that's the kind of thing I'd like to move into the Dockerfile as a source of truth on all dependencies and build steps.

Interesting one, it seems that there previously where two tags (normal and open-data), for the 2.0.0 release the open-data workflow has been removed:
https://github.com/TandoorRecipes/recipes/tree/feature/vue3/.github/workflows

Maybe @vabene1111 has something else in mind to import the open data (maybe in the new interface?) instead of releasing two containers.

Otherwise I would make it an ARG OPENDATA and do some actions within the Dockerfile based on the buildarg (as the last layer so it won't invalidate layers). Then it can easily be one workflow with two builds where the open-data build can reuse all the layers of the normal image.

vabene1111 · 2025-04-10T15:11:02Z

thanks, I definitely already have no idea what you are talking about but it seems you two understand each other 😂

regarding the open data/submodules: tandoor supports plugins. This is not really documented and not used by anyone other than me to add the open data project to the hosted instance and to add some hosted specific code to that. In the future I would like the abiliy to install these plugins at runtime because its annoying to keep multiple pipelines working for them, thats why I removed them for the tandoor 2 branch. The problem is that I am not yet sure that I will be able to do that. Ideally there would be an interface where you add a git url, clone the repo, yarn build runs in the background and then the new modules are available. At the same time during development I want them to be in the same structure to make it faster.

I would say for now leave this out, I will need to check if runtime installing of plugins works and we will see. Obviously if any of you are interested in building that support (or at least making sure the required dependencies will be available in the image) that would be great.

One more information regarding the volumes: The only reason I use a mix of volumes and bind mounts is to provide the default nginx config. I have always kept nginx seperate from the base image because I thought people should have the freedom to choose their webserver and because I thought bundling nginx might be a bad idea. That said it has caused many people trouble over the years and I think there are many instances running without the nginx which is not good and can even break things from time to time, so I would not be opposed to including nginx directly in the image, depending on what you two think is best practice. I am happy to get your input on this matter

boveloco

Can we take the time to set a user to dockerfile so it won't run as ROOT?

edysli · 2025-05-23T22:05:52Z

IMO the non root running of the image would be part of this. At this moment you can't run this image on Kubernetes where Pod Security is enforced.

I'm in that situation too, trying (and so far failing) to run everything under the restricted Pod Security Standard. So I'd very much appreciate the non-root user.

@wilmardo actually, a user account doesn't even need to exist in the container image. USER 10000:10000 is enough. That's how we run our containers at work and it's not an issue unless the application tries to look up the username it's running under.

edysli · 2025-05-23T22:16:40Z

Thank you @theProf ! 😄 I think those are good improvements to the clarity of the container build process.

theProf · 2025-05-26T19:44:47Z

Sorry I haven't been responsive, had some personal life going on. Great to see all the activity and feedback!

I'll take a crack at these (and creating a checklist to split things up) tomorrow morning.

…im size

…roduction

theProf · 2025-06-04T11:28:24Z

Took a bit longer as I had to get reacquainted with the code. Only pushed a rebase from feature/vue3. But putting update in the issue

wilmardo · 2025-07-08T07:28:19Z

@theProf this has probably been closed by accident when the branch got deleted. Would you mind reopening?

theProf · 2025-08-08T14:06:47Z

@theProf this has probably been closed by accident when the branch got deleted. Would you mind reopening?

thanks for the nudge. initially thought it was closed out intentionally. Have had some personal life events so haven't been so active. I'll open this back up and likely stack some changes with @wilmardo PRs ref

theProf force-pushed the build/docker-optimization branch from eaaa986 to fb04428 Compare March 21, 2025 03:44

theProf commented Mar 21, 2025

View reviewed changes

theProf force-pushed the build/docker-optimization branch from 18ebcd6 to 523a9de Compare March 22, 2025 02:29

theProf commented Mar 22, 2025

View reviewed changes

theProf mentioned this pull request Mar 22, 2025

[Code] Improvements to project toolchain and build system #3612

Open

10 tasks

wilmardo reviewed Apr 8, 2025

View reviewed changes

vabene1111 mentioned this pull request Apr 14, 2025

🐛 [Tandoor] Error loading shared library libssl.so.1.1 alexbelgium/hassio-addons#1800

Closed

boveloco reviewed Apr 18, 2025

View reviewed changes

theProf added 9 commits June 4, 2025 06:56

chore(lint): format Dockerfile (dockerls)

ffd0877

chore(lint): address LegacyKeyValueFormat warnings

b154458

build(docker): avoid unnecessary layers with heredoc RUN (format: shfmt)

9809456

build(docker): use stages in Dockerfile

a178b7f

build(docker): use --link flag with COPY and group together

15c0add

build(docker): use --chmod on COPY to avoid unnecessary layer

46ef232

build(docker): avoid unnecessary layers with heredoc RUN

460c64c

build(docker): remove unnecessary mkdir

4e2546b

chore(docker): sort packages for readability

d4e9d20

theProf added 6 commits June 4, 2025 06:58

build(docker): use base and copy build artifacts to final stage to sl…

f33eef3

…im size

build(docker): move PYTHONUNBUFFERED to base image as it is used in p…

c3f861d

…roduction

chore(docker): linting and comment spacing

f3ecf78

build(docker): build and copy vue3 in Dockerfile

2ff4a81

chore: specify [email protected] in package.json

2dfed98

build(ci): build vue3 with [email protected]

457d44d

theProf force-pushed the build/docker-optimization branch from 523a9de to 457d44d Compare June 4, 2025 11:26

build: run as non-root user

e94d628

vabene1111 deleted the branch TandoorRecipes:feature/vue3 June 22, 2025 08:34

vabene1111 closed this Jun 22, 2025

theProf mentioned this pull request Aug 8, 2025

build: Optimize Dockerfile #3919

Open

		@@ -1,44 +1,94 @@
		FROM python:3.13-alpine3.21
		FROM node:20.19-alpine AS node-deps

	RUN apk add --no-cache \
	RUN --mount=type=cache,target=/etc/apk/cache \
	apk add \

	RUN <<EOF
	RUN --mount=type=cache,target=/etc/apk/cache --mount=type=cache,target=/root/.cache <<EOF

	apk add --no-cache --virtual .build-deps \
	apk add --virtual .build-deps \

	venv/bin/pip install -r requirements.txt --no-cache-dir
	venv/bin/pip install -r requirements.txt

Uh oh!

build: Optimize Dockerfile #3606

build: Optimize Dockerfile #3606

Uh oh!

Conversation

theProf commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background

Comparison of sizes

Questions

Uh oh!

CLAassistant commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vabene1111 commented Mar 21, 2025

Uh oh!

theProf commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vabene1111 commented Mar 22, 2025

Uh oh!

theProf commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilmardo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theProf commented Apr 8, 2025

Uh oh!

wilmardo commented Apr 9, 2025

Uh oh!

vabene1111 commented Apr 10, 2025

Uh oh!

boveloco left a comment

Choose a reason for hiding this comment

Uh oh!

edysli commented May 23, 2025

Uh oh!

edysli commented May 23, 2025

Uh oh!

theProf commented May 26, 2025

Uh oh!

theProf commented Jun 4, 2025

Uh oh!

wilmardo commented Jul 8, 2025

Uh oh!

theProf commented Aug 8, 2025

Uh oh!

Uh oh!

theProf commented Mar 21, 2025 •

edited

Loading

CLAassistant commented Mar 21, 2025 •

edited

Loading

theProf commented Mar 21, 2025 •

edited

Loading

theProf commented Mar 22, 2025 •

edited

Loading