-
Notifications
You must be signed in to change notification settings - Fork 524
Add agent purge command #3982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add agent purge command #3982
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in this review @guilhermocc
cmd/spire-server/cli/agent/clean.go
Outdated
"google.golang.org/protobuf/types/known/wrapperspb" | ||
) | ||
|
||
type cleanCommand struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the clean
subcommand is kind of ambiguous ... we use the verb "prune" in bundle management, which means flushing out expired keys from the bundle. Perhaps that would be better? spire-server agent prune
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I was in doubt between clean and prune, but prune seems to be better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-dryRun | ||
Indicates that the command will not perform any action, but will print the agents that would be purged. | ||
-expiredBefore string | ||
Specifies the date before which all expired agents should be deleted. The value should be a date time string in the format "YYYY-MM-DD HH:MM:SS". Any agents that expired before this date will be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other time format options do we have? I wonder what other projects do that require time input on a CLI param .. the first thing I noticed here is that no timezone is specified. The second thing is that there's a space in it 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point here, this field is too complex. Well, we have plenty of different date or datetime formats that we can use, for simplifying things we could use just the date format YYYY-MM-DD. Another option that we can use is duration value, in which the user could provide the duration value in "ns", "us" (or "µs"), "ms", "s", "m" and "h". I would go with the date format on this, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.RFC3339 seems like a good choice for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fixed time seems safer than a user trying to cobble together a duration that happens to line up with the time they want :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.RFC3339 would give the user full precision on the purge, it sounds good
cmd/spire-server/cli/agent/clean.go
Outdated
agents := resp.GetAgents() | ||
expiredAgents := &ExpiredAgents{Agents: []*ExpiredAgent{}} | ||
|
||
for _, agent := range agents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now, though my preference would be for the prune logic to happen in SPIRE core, and expose an RPC. In the future, I think we'll want to regularly call this prune logic against agents that we know are ~safe to flush out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about that, but wasn't sure if this would fit into this issue scope since it is a change in the agent's service api, right?
cmd/spire-server/cli/agent/clean.go
Outdated
result := &ExpiredAgent{AgentID: id} | ||
|
||
if !c.dryRun { | ||
if _, err := agentClient.DeleteAgent(ctx, &agentv1.DeleteAgentRequest{Id: agent.Id}); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check this error, but we don't log it or otherwise notify the user .. they'll see that some agents weren't purged and won't know why. Maybe ExpiredAgent needs an error field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/spire-server/cli/agent/clean.go
Outdated
type ExpiredAgents struct { | ||
Agents []*ExpiredAgent `json:"expired_agents"` | ||
} | ||
|
||
type ExpiredAgent struct { | ||
AgentID spiffeid.ID `json:"agent_id"` | ||
Deleted bool `json:"deleted"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be exported or unexported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unexported 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e30efd
to
d9e2ae7
Compare
As discussed in the last contributor sync, an |
-dryRun | ||
Indicates that the command will not perform any action, but will print the agents that would be purged. | ||
-expiredFor duration | ||
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s) | |
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge (default: 24h0m0s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-dryRun | ||
Indicates that the command will not perform any action, but will print the agents that would be purged. | ||
-expiredFor duration | ||
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s) | |
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge (default: 24h0m0s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/spire-server/cli/agent/purge.go
Outdated
} | ||
|
||
func (*purgeCommand) Synopsis() string { | ||
return "Delete expired agents that attested using a non-TOFU security model based on a given time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "Delete expired agents that attested using a non-TOFU security model based on a given time" | |
return "Purge expired agents that were attested using a non-TOFU security model based on a given time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/spire-server/cli/agent/purge.go
Outdated
} | ||
|
||
func (c *purgeCommand) AppendFlags(fs *flag.FlagSet) { | ||
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Specifies the time since the agent's SVID has expired, used for filtering agents to purge.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Specifies the time since the agent's SVID has expired, used for filtering agents to purge.") | |
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
06c00a4
to
f743c97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the contrib and patience @guilhermocc and also for the review @maxlambrecht
Left a couple comments/nits here and there, but this is looking good to me
-dryRun | ||
Indicates that the command will not perform any action, but will print the agents that would be purged. | ||
-expiredFor duration | ||
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge. (default 24h0m0s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel it is pretty common for agents to go away and come back, judging by the volume of questions etc in the slack and GH issues. Based on that, and also a bit of "IMO" , I wonder if this default should be closer to one month than one day. For what purposes would someone prune, and for what outcome? Feels like being conservative here is a good idea unless the use case(s) call for something more aggressive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it makes sense to be more conservative here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
msg := fmt.Sprintf("Found %d expired ", len(expAgents.Agents)) | ||
msg = util.Pluralizer(msg, "agent", "agents", len(expAgents.Agents)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
small details like this make a big difference in user perception and experience. thank you 🙌
cmd/spire-server/cli/agent/purge.go
Outdated
c.env.Println("Agents not purged:") | ||
for _, result := range agentsNotPurged { | ||
c.env.Printf("SPIFFE ID : %s\n", result.AgentID.String()) | ||
if result.Error != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case can we have an agent that wasn't purged and also no error string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case that you described is unreachable; an agent that wasn't purged (not using dryRun) will always come with an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Guilherme Carvalho <[email protected]>
* Add agent clean command --------- Signed-off-by: Guilherme Carvalho <[email protected]> Co-authored-by: Evan Gilman <[email protected]> Signed-off-by: Dmitry Gorochovsky <[email protected]>
Pull Request check list
Affected functionality
Add new agent purge command for cleaning expired agents attested using a non-TOFU security model.
Description of change
Added the new command, including unitary tests
Which issue this PR fixes
Fixes #1836