Skip to content

Conversation

@rnorth
Copy link
Member

@rnorth rnorth commented May 21, 2020

by ensuring that find_gradle_jobs and check jobs always use different cache keys

#1874 surfaced a glitch in the GitHub Actions caching of our CI jobs. This manifested as PR tests being skipped.

For background/interest:

  1. We had two tests failing on a PR due to merge conflicts
  2. I pushed a commit that fixed one of them (I should have noticed there were two failures, but instead believed there was just one)
2020-05-21T15:11:02.8202572Z :postgresql:test (Thread[Daemon worker,5,main]) started.
2020-05-21T15:11:03.6192039Z Gradle Test Executor 1 started executing tests.
2020-05-21T15:11:04.8192531Z 
2020-05-21T15:11:04.8221625Z > Task :postgresql:test
2020-05-21T15:11:04.8223267Z Build cache key for task ':postgresql:test' is cbf31443f435cc983e30334fb9fbf5df
2020-05-21T15:11:04.8223770Z Task ':postgresql:test' is not up-to-date because:
2020-05-21T15:11:04.8223916Z   No history is available.
2020-05-21T15:11:04.8224363Z Did not find cache item 'cache/cbf31443f435cc983e30334fb9fbf5df' in S3 bucket
  1. That commit got built, but the test task skipped entirely due to caching:
2020-05-21T15:27:30.6515151Z > Task :postgresql:test FROM-CACHE
2020-05-21T15:27:30.6515923Z Build cache key for task ':postgresql:test' is c8ae3ed5e2e3e487194cf26b06391391
2020-05-21T15:27:30.6516556Z Task ':postgresql:test' is not up-to-date because:
2020-05-21T15:27:30.6516931Z   No history is available.
2020-05-21T15:27:30.6517546Z Loaded cache entry for task ':postgresql:test' with cache key c8ae3ed5e2e3e487194cf26b06391391
2020-05-21T15:27:30.6518183Z :postgresql:test (Thread[Execution worker for ':',5,main]) completed. Took 0.073 secs.
2020-05-21T15:27:30.6518575Z :postgresql:check (Thread[Daemon worker,5,main]) started.
  1. So, PR check state was all green and I merged it
  2. Then master branch rebuilt and the failing test re-emerged

Our belief was that this could relate to the find_gradle_jobs CI job - it works by disabling the gradle test executor so that it can quickly (simulating a full check) find out which gradle tasks need to be executed.

We initially believed that we could be accidentally pushing the cached result of these no-op test tasks to the remote gradle cache in S3, but the configuration is solid to avoid this problem.

We subsequently realised that actually the leakage is occurring via GitHub Actions caching: the final restore_key for the check job would match the key output during the find_gradle_jobs, and thus there is a possibility that the local gradle cache could be shared.

Quite simply, the no-op test task was being put into the local gradle cache, and some % of the time was being used as a signal that tests were already executed.

by ensuring that find_gradle_jobs and check jobs always use different cache keys
@rnorth rnorth requested review from bsideup and kiview as code owners May 21, 2020 19:22
@rnorth rnorth merged commit 74a88ad into master May 22, 2020
@rnorth rnorth deleted the prevent-local-gradle-cache-collisions branch May 22, 2020 08:36
quincy pushed a commit to quincy/testcontainers-java that referenced this pull request May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants