Skip to content

Conversation

m-hemmings
Copy link
Member

@m-hemmings m-hemmings commented Jul 30, 2025

PURPOSE

Update Clang from version 16 to 18 to enable compatibility with newer toolchains and prepare for upcoming features.
STATUS

Working:

  • Dockerfile.dev builds successfully with Clang 18
  • All tests pass in the development image

Not working:

  • Dockerfile.e2e is not yet compatible with Clang 18

@m-hemmings m-hemmings requested a review from rennergade July 30, 2025 16:24
@m-hemmings m-hemmings linked an issue Jul 30, 2025 that may be closed by this pull request
@m-hemmings
Copy link
Member Author

m-hemmings commented Jul 30, 2025

The major changes to Dockerfile.e2e are:

  1. Changing LLVM and CLANG versions
  2. The installation of libtinfo5

The other changes are in make_glibc_and_sysroot.sh

  1. Update the CFLAGS
  2. Add manual compilation of elision-lock and eilision-unlock

When I do this in Dockerfile.dev, it works and all the tests passed, but the same is not true for Dockerfile.e2e, as you can see in this build log.

It looks to me like the addition of the compile steps for elision-lock and elision-unlock below

# Compile elision-lock.c
$CC $CFLAGS $WARNINGS $EXTRA_FLAGS \
    $INCLUDE_PATHS $SYS_INCLUDE $DEFINES $EXTRA_DEFINES \
    -o $GLIBC/build/nptl/elision-lock.o \
    -c $GLIBC/sysdeps/unix/sysv/linux/x86/elision-lock.c \
    -MD -MP -MF $GLIBC/build/nptl/elision-lock.o.dt \
    -MT $GLIBC/build/nptl/elision-lock.o

# Compile elision-unlock.c
$CC $CFLAGS $WARNINGS $EXTRA_FLAGS \
    $INCLUDE_PATHS $SYS_INCLUDE $DEFINES $EXTRA_DEFINES \
    -o $GLIBC/build/nptl/elision-unlock.o \
    -c $GLIBC/sysdeps/unix/sysv/linux/x86/elision-unlock.c \
    -MD -MP -MF $GLIBC/build/nptl/elision-unlock.o.dt \
    -MT $GLIBC/build/nptl/elision-unlock.o

either aren't compiling correctly, or the result isn't being passed further down the chain, because the test failures all seem to reference it like this:

"/tests/unit-tests/file_tests/non-deterministic/dup.c": {
#20 24.04                 "status": "Failure",
#20 24.04                 "error_type": "Lind_wasm_compiling",
#20 24.04                 "output": "\nwasm-ld: warning: function signature mismatch: __clone\n>>> defined as (i32, i32, i32, i32, 
i32) -> i32 in //src/glibc/sysroot/lib/wasm32-wasi/libc.a(clone-internal.o)\n>>> defined as () -> i32 in //src/glibc/sysroot/lib/
wasm32-wasi/libc.a(clone.o)\n\nwasm-ld: error: //src/glibc/sysroot/lib/wasm32-wasi/libc.a(pthread_mutex_lock.o): undefined 
symbol: __lll_lock_elision\nwasm-ld: error: //src/glibc/sysroot/lib/wasm32-wasi/libc.a(pthread_mutex_unlock.o): undefined 
symbol: __lll_unlock_elision\nclang: error: linker command failed with exit code 1 (use -v to see invocation)\n"
#20 24.04             },

@lukpueh I would appreciate any insight into this that you might offer

@lukpueh
Copy link
Contributor

lukpueh commented Aug 4, 2025

When I do this in Dockerfile.dev, it works and all the tests passed, but the same is not true for Dockerfile.e2e, as you can see in this build log.

Did you compare the logs from Dockerfile.dev and Dockerfile.e2e? The build log you attached has fatal error: 'stddef.h' file not found under RUN make sysroot. Is that expected? Maybe there's a missing dependency? (you seem to install different things in .dev and .e2e)

@rennergade rennergade requested a review from J7-7-7 August 5, 2025 16:41
@m-hemmings
Copy link
Member Author

@J7-7-7 I added 3 files here to see if you can see something. I'm trying to more closely mimic the successful building in .dev with .staged, and if you pull it and run like this:

docker buildx build   --file ./scripts/Dockerfile.staged   --target build-glibc   --tag staged-glibc   --platform=linux/amd64   --load   --build-arg BRANCH_NAME=301-redo-migration-to-clang .

It fails, but maybe you can see why with me

@m-hemmings
Copy link
Member Author

@J7-7-7 Could you try pulling this and building like this?

docker buildx build   --file ./scripts/Dockerfile.e2e   --tag clang-18-prod-test:2.0   --cache-from type=local,src=~/.cache/docker-buildx   --cache-to type=local,dest=~/.cache/docker-buildx   --platform=linux/amd64   --load

I got a pass on the build and all the tests, attached below:
builder-prod.log

J7-7-7
J7-7-7 previously approved these changes Aug 25, 2025
Copy link
Contributor

@J7-7-7 J7-7-7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It builds, takes time but it works. Great work @m-hemmings

@rennergade
Copy link
Contributor

It builds, takes time but it works. Great work @m-hemmings

does it take significantly more time?

@m-hemmings
Copy link
Member Author

m-hemmings commented Aug 25, 2025

It builds, takes time but it works. Great work @m-hemmings

does it take significantly more time?

I have to do some code cleanup on this anyway, I'll time them today and post results

@m-hemmings
Copy link
Member Author

@rennergade I timed them as follows on a system with 16 vCPUs, 64 GB Memory:

Timings (h:mm:ss):
  branch: 22:50.36
  main: 15:49.69

@m-hemmings m-hemmings force-pushed the 301-redo-migration-to-clang-18 branch from f39c4f8 to 787117f Compare August 25, 2025 21:15
@rennergade
Copy link
Contributor

@rennergade I timed them as follows on a system with 16 vCPUs, 64 GB Memory:

Timings (h:mm:ss):
  branch: 22:50.36
  main: 15:49.69

do we have an idea why its 7min longer?

@m-hemmings
Copy link
Member Author

@rennergade I timed them as follows on a system with 16 vCPUs, 64 GB Memory:

Timings (h:mm:ss):
  branch: 22:50.36
  main: 15:49.69

do we have an idea why its 7min longer?

No, honestly, that seems extreme considering all that got added was two manual compiles, a library install, and some padding around a curl that shouldn't even lengthen it, just protect it from a flaky network connection.
I did run them back to back on the same machine, so it's also possible that there was some caching that helped the second one be faster because of similarities in base layers

@m-hemmings m-hemmings marked this pull request as ready for review August 27, 2025 18:07
@m-hemmings m-hemmings force-pushed the 301-redo-migration-to-clang-18 branch from 6084c9c to 43a653a Compare August 27, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redo Migration to Clang-18
4 participants