-
Notifications
You must be signed in to change notification settings - Fork 113
Update TF install to shared binary approach #10147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Tests4 094 tests ±0 4 091 ✅ ±0 7m 23s ⏱️ +3s Results for commit 7624827. ± Comparison against base commit d5d62cb. This pull request removes 4 and adds 4 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10147 +/- ##
==========================================
+ Coverage 49.81% 49.83% +0.01%
==========================================
Files 640 640
Lines 49690 49733 +43
==========================================
+ Hits 24753 24782 +29
- Misses 23050 23066 +16
+ Partials 1887 1885 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
80bf545
to
155c718
Compare
bdbc5a9
to
58ec7fc
Compare
58ec7fc
to
6b33054
Compare
6b33054
to
a55a451
Compare
a55a451
to
b417b00
Compare
Signed-off-by: lakshmimsft <[email protected]>
Signed-off-by: lakshmimsft <[email protected]>
b417b00
to
af097dd
Compare
86aec85
to
3b5ee35
Compare
35c4520
to
7c9d898
Compare
f947c9d
to
bc6353d
Compare
bc6353d
to
4a02b42
Compare
Signed-off-by: lakshmimsft <[email protected]>
4a02b42
to
7624827
Compare
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
|
||
return fmt.Sprintf("%x", hash), nil | ||
return suffix, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the cleanup
@@ -34,6 +34,9 @@ import ( | |||
const ( | |||
executionSubDir = "deploy" | |||
workingDirFileMode fs.FileMode = 0700 | |||
|
|||
// DefaultStateLockTimeout is the default timeout for acquiring Terraform state locks | |||
DefaultStateLockTimeout = "10m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the app going to wait 10 minutes to acquire the lock? I think our worker will have already timed out by then because it uses 2 minutes in terms of timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it tries to acquire the lock immediately. It will wait for 10min if the lock is not available. https://developer.hashicorp.com/terraform/cli/commands/apply#lock-timeout-duration
|
||
// ensureGlobalTerraformBinary ensures a global shared Terraform binary is available. | ||
// Uses mutex-based locking to prevent race conditions during concurrent access. | ||
func ensureGlobalTerraformBinary(ctx context.Context, installer *install.Installer, logger logr.Logger) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this function can be split into smaller pieces in terms of modularity. I see a few opportunities:
- If
os.Stat(globalBinary)
&&os.Stat(globalMarker)
is true then we can just check to see if globalTerraformReady is true or false. Based on that we can call another function likeverifyBinaryWorks
which can be used in other places too. - Some
logger.Info
s could belogger.Warn
ifWarn
exists for logger. - Download and Install Terraform could be split to another function.
- We do
tfexec.NewTerraform
andtf.Version
in a few places. We can split that to its own function.
My main concern with this function is that it does a lot of things and testing it would be difficult. We should simplify it as much as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this in separate PR.
// initAndApply runs Terraform init and apply in the provided working directory. | ||
func initAndApply(ctx context.Context, tf *tfexec.Terraform) (*tfjson.State, error) { | ||
func initAndApply(ctx context.Context, tf *tfexec.Terraform, stateLockTimeout string) (*tfjson.State, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing stateLockTimeout, we can pass in ...tfexec.ApplyOption
(or an array of the same) because, in future, we can have more options and they are just going to float the signature of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might leave this as is for now. We can refactor when we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
ResourceRecipe: &opts.Recipe, | ||
EnvRecipe: &opts.Definition, | ||
Secrets: opts.Secrets, | ||
StateLockTimeout: terraform.DefaultStateLockTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the deployment takes longer than StateLockTimeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is time taken to wait for lock to be available, so it can then run the deployment. https://developer.hashicorp.com/terraform/cli/commands/apply#lock-timeout-duration
if execPath := tf.ExecPath(); execPath != "" { | ||
if _, err := os.Stat(execPath); err != nil { | ||
logger.Info(fmt.Sprintf("ERROR: Terraform binary missing at %s during state fetch: %s", execPath, err.Error())) | ||
return nil, fmt.Errorf("terraform binary disappeared at %s: %w", execPath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if we wanna add exec path in the error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this in separate PR
Description
This pull request updates Terraform install to a global shared binary approach, which alleviates race conditions and state lock errors caused by concurrent operations. Additionally, the changes include state lock timeout handling and error checking around Terraform binary availability.
Type of change
Fixes: #10179
Contributor checklist
Please verify that the PR meets the following requirements, where applicable: