Skip to content

Conversation

lakshmimsft
Copy link
Contributor

@lakshmimsft lakshmimsft commented Aug 1, 2025

Description

This pull request updates Terraform install to a global shared binary approach, which alleviates race conditions and state lock errors caused by concurrent operations. Additionally, the changes include state lock timeout handling and error checking around Terraform binary availability.

Type of change

  • This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and doesn't change the functionality of Radius (issue link optional).

Fixes: #10179

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document PR is created in the design-notes repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for the samples repository is created, if existing samples are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable
  • A PR for the recipes repository is created, if existing recipes are affected by the changes in this PR.
    • Yes
    • Not applicable

Copy link

github-actions bot commented Aug 1, 2025

Unit Tests

4 094 tests  ±0   4 091 ✅ ±0   7m 23s ⏱️ +3s
  307 suites ±0       3 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 7624827. ± Comparison against base commit d5d62cb.

This pull request removes 4 and adds 4 tests. Note that renamed tests count towards both.
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_NoPreMountedBinary_Download
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_PreMountedBinary
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_PreMountedBinaryInvalid_FallbackToDownload
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_PreMountedBinaryNotExecutable
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_GlobalBinaryConcurrency
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_GlobalBinaryReuse
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_MultipleConcurrentCallsUseSameBinary
github.com/radius-project/radius/pkg/recipes/terraform ‑ TestInstall_SuccessfulDownload

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 42.22222% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.83%. Comparing base (d5d62cb) to head (7624827).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/recipes/terraform/install.go 43.95% 43 Missing and 8 partials ⚠️
pkg/recipes/terraform/execute.go 0.00% 27 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10147      +/-   ##
==========================================
+ Coverage   49.81%   49.83%   +0.01%     
==========================================
  Files         640      640              
  Lines       49690    49733      +43     
==========================================
+ Hits        24753    24782      +29     
- Misses      23050    23066      +16     
+ Partials     1887     1885       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: lakshmimsft <[email protected]>
@lakshmimsft lakshmimsft changed the title [WIP] Do Not Review TF Updates Update TF install to shared binary approach Aug 5, 2025
@radius-functional-tests
Copy link

radius-functional-tests bot commented Aug 5, 2025

Radius functional test overview

🔍 Go to test action run

Name Value
Repository radius-project/radius
Commit ref 7624827
Unique ID funcbd55d410da
Image tag pr-funcbd55d410da
Click here to see the list of tools in the current test run
  • gotestsum 1.12.0
  • KinD: v0.29.0
  • Dapr:
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-funcbd55d410da
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-funcbd55d410da
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-funcbd55d410da
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-funcbd55d410da
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-funcbd55d410da
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting corerp-cloud functional tests...
⌛ Starting ucp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@lakshmimsft lakshmimsft marked this pull request as ready for review August 5, 2025 17:33
@lakshmimsft lakshmimsft requested review from a team as code owners August 5, 2025 17:33

return fmt.Sprintf("%x", hash), nil
return suffix, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the cleanup

@@ -34,6 +34,9 @@ import (
const (
executionSubDir = "deploy"
workingDirFileMode fs.FileMode = 0700

// DefaultStateLockTimeout is the default timeout for acquiring Terraform state locks
DefaultStateLockTimeout = "10m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the app going to wait 10 minutes to acquire the lock? I think our worker will have already timed out by then because it uses 2 minutes in terms of timeout.

Copy link
Contributor Author

@lakshmimsft lakshmimsft Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it tries to acquire the lock immediately. It will wait for 10min if the lock is not available. https://developer.hashicorp.com/terraform/cli/commands/apply#lock-timeout-duration


// ensureGlobalTerraformBinary ensures a global shared Terraform binary is available.
// Uses mutex-based locking to prevent race conditions during concurrent access.
func ensureGlobalTerraformBinary(ctx context.Context, installer *install.Installer, logger logr.Logger) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this function can be split into smaller pieces in terms of modularity. I see a few opportunities:

  1. If os.Stat(globalBinary) && os.Stat(globalMarker) is true then we can just check to see if globalTerraformReady is true or false. Based on that we can call another function like verifyBinaryWorks which can be used in other places too.
  2. Some logger.Infos could be logger.Warn if Warn exists for logger.
  3. Download and Install Terraform could be split to another function.
  4. We do tfexec.NewTerraform and tf.Version in a few places. We can split that to its own function.

My main concern with this function is that it does a lot of things and testing it would be difficult. We should simplify it as much as we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in separate PR.

// initAndApply runs Terraform init and apply in the provided working directory.
func initAndApply(ctx context.Context, tf *tfexec.Terraform) (*tfjson.State, error) {
func initAndApply(ctx context.Context, tf *tfexec.Terraform, stateLockTimeout string) (*tfjson.State, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing stateLockTimeout, we can pass in ...tfexec.ApplyOption (or an array of the same) because, in future, we can have more options and they are just going to float the signature of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might leave this as is for now. We can refactor when we need to.

Copy link
Contributor

@sylvainsf sylvainsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

ResourceRecipe: &opts.Recipe,
EnvRecipe: &opts.Definition,
Secrets: opts.Secrets,
StateLockTimeout: terraform.DefaultStateLockTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the deployment takes longer than StateLockTimeout ?

Copy link
Contributor Author

@lakshmimsft lakshmimsft Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is time taken to wait for lock to be available, so it can then run the deployment. https://developer.hashicorp.com/terraform/cli/commands/apply#lock-timeout-duration

if execPath := tf.ExecPath(); execPath != "" {
if _, err := os.Stat(execPath); err != nil {
logger.Info(fmt.Sprintf("ERROR: Terraform binary missing at %s during state fetch: %s", execPath, err.Error()))
return nil, fmt.Errorf("terraform binary disappeared at %s: %w", execPath, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if we wanna add exec path in the error output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in separate PR

@lakshmimsft lakshmimsft merged commit ad7811f into main Aug 5, 2025
36 checks passed
@lakshmimsft lakshmimsft deleted the lakshmimsft/testtf branch August 5, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LRT/Non-Cloud Func tests: Terraform State Lock/409 Issues
4 participants