Skip to content

Conversation

guilhermocc
Copy link
Contributor

@guilhermocc guilhermocc commented Mar 15, 2023

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality
Add new agent purge command for cleaning expired agents attested using a non-TOFU security model.

Description of change
Added the new command, including unitary tests

Which issue this PR fixes
Fixes #1836

Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in this review @guilhermocc

"google.golang.org/protobuf/types/known/wrapperspb"
)

type cleanCommand struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the clean subcommand is kind of ambiguous ... we use the verb "prune" in bundle management, which means flushing out expired keys from the bundle. Perhaps that would be better? spire-server agent prune?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I was in doubt between clean and prune, but prune seems to be better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-dryRun
Indicates that the command will not perform any action, but will print the agents that would be purged.
-expiredBefore string
Specifies the date before which all expired agents should be deleted. The value should be a date time string in the format "YYYY-MM-DD HH:MM:SS". Any agents that expired before this date will be deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other time format options do we have? I wonder what other projects do that require time input on a CLI param .. the first thing I noticed here is that no timezone is specified. The second thing is that there's a space in it 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point here, this field is too complex. Well, we have plenty of different date or datetime formats that we can use, for simplifying things we could use just the date format YYYY-MM-DD. Another option that we can use is duration value, in which the user could provide the duration value in "ns", "us" (or "µs"), "ms", "s", "m" and "h". I would go with the date format on this, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.RFC3339 seems like a good choice for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fixed time seems safer than a user trying to cobble together a duration that happens to line up with the time they want :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.RFC3339 would give the user full precision on the purge, it sounds good

agents := resp.GetAgents()
expiredAgents := &ExpiredAgents{Agents: []*ExpiredAgent{}}

for _, agent := range agents {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, though my preference would be for the prune logic to happen in SPIRE core, and expose an RPC. In the future, I think we'll want to regularly call this prune logic against agents that we know are ~safe to flush out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about that, but wasn't sure if this would fit into this issue scope since it is a change in the agent's service api, right?

result := &ExpiredAgent{AgentID: id}

if !c.dryRun {
if _, err := agentClient.DeleteAgent(ctx, &agentv1.DeleteAgentRequest{Id: agent.Id}); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check this error, but we don't log it or otherwise notify the user .. they'll see that some agents weren't purged and won't know why. Maybe ExpiredAgent needs an error field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 91 to 98
type ExpiredAgents struct {
Agents []*ExpiredAgent `json:"expired_agents"`
}

type ExpiredAgent struct {
AgentID spiffeid.ID `json:"agent_id"`
Deleted bool `json:"deleted"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be exported or unexported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unexported 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guilhermocc guilhermocc changed the title Add agent clean command Add agent purge command Mar 28, 2023
@guilhermocc guilhermocc requested a review from evan2645 March 28, 2023 18:15
@MarcosDY MarcosDY added this to the 1.6.2 milestone Mar 30, 2023
@amartinezfayo amartinezfayo modified the milestones: 1.6.2, 1.6.3, 1.6.4 Apr 5, 2023
@guilhermocc guilhermocc force-pushed the new-clean-agents-command branch from 1e30efd to d9e2ae7 Compare April 12, 2023 12:58
@guilhermocc
Copy link
Contributor Author

As discussed in the last contributor sync, an -expiredFor flag would improve the command usability, so I'm replacing the -expiredBefore with -expiredFor 👍

-dryRun
Indicates that the command will not perform any action, but will print the agents that would be purged.
-expiredFor duration
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s)
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge (default: 24h0m0s)

Copy link
Contributor Author

@guilhermocc guilhermocc Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-dryRun
Indicates that the command will not perform any action, but will print the agents that would be purged.
-expiredFor duration
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specifies the time since the agent's SVID has expired, used for filtering agents to purge. (default 24h0m0s)
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge (default: 24h0m0s)

Copy link
Contributor Author

@guilhermocc guilhermocc Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

func (*purgeCommand) Synopsis() string {
return "Delete expired agents that attested using a non-TOFU security model based on a given time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "Delete expired agents that attested using a non-TOFU security model based on a given time"
return "Purge expired agents that were attested using a non-TOFU security model based on a given time"

Copy link
Contributor Author

@guilhermocc guilhermocc Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

func (c *purgeCommand) AppendFlags(fs *flag.FlagSet) {
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Specifies the time since the agent's SVID has expired, used for filtering agents to purge.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Specifies the time since the agent's SVID has expired, used for filtering agents to purge.")
fs.DurationVar(&c.expiredFor, "expiredFor", 24*time.Hour, "Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge.")

Copy link
Contributor Author

@guilhermocc guilhermocc Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
Signed-off-by: Guilherme Carvalho <[email protected]>
@guilhermocc guilhermocc force-pushed the new-clean-agents-command branch from 06c00a4 to f743c97 Compare April 19, 2023 18:25
evan2645
evan2645 previously approved these changes Apr 20, 2023
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the contrib and patience @guilhermocc and also for the review @maxlambrecht

Left a couple comments/nits here and there, but this is looking good to me

-dryRun
Indicates that the command will not perform any action, but will print the agents that would be purged.
-expiredFor duration
Amount of time that has passed since the agent's SVID has expired. It is used to determine which agents to purge. (default 24h0m0s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel it is pretty common for agents to go away and come back, judging by the volume of questions etc in the slack and GH issues. Based on that, and also a bit of "IMO" , I wonder if this default should be closer to one month than one day. For what purposes would someone prune, and for what outcome? Feels like being conservative here is a good idea unless the use case(s) call for something more aggressive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it makes sense to be more conservative here 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

msg := fmt.Sprintf("Found %d expired ", len(expAgents.Agents))
msg = util.Pluralizer(msg, "agent", "agents", len(expAgents.Agents))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

small details like this make a big difference in user perception and experience. thank you 🙌

c.env.Println("Agents not purged:")
for _, result := range agentsNotPurged {
c.env.Printf("SPIFFE ID : %s\n", result.AgentID.String())
if result.Error != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case can we have an agent that wasn't purged and also no error string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case that you described is unreachable; an agent that wasn't purged (not using dryRun) will always come with an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Guilherme Carvalho <[email protected]>
@evan2645 evan2645 merged commit ef96482 into spiffe:main Apr 25, 2023
d-goro pushed a commit to d-goro/spire that referenced this pull request May 18, 2023
* Add agent clean command

---------

Signed-off-by: Guilherme Carvalho <[email protected]>
Co-authored-by: Evan Gilman <[email protected]>
Signed-off-by: Dmitry Gorochovsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expired Agent data left in SPIRE database indefinitely
6 participants