-
Notifications
You must be signed in to change notification settings - Fork 87
Use only checkpoint data for CD unprepare (do not GET claim) #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -12,14 +12,16 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedRawClaims PreparedRawClaims `json:"preparedRawClaims,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for introducing a new top-level key for storing the ResourceClaim API objects. The type/name PreparedClaims
was already taken.
Let's find a better name than PreparedRawClaims
. PreparedResourceClaims
? PreparedClaimObjects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now repurposed the old name PreparedClaims
.
What was previously called PreparedClaims
is now called PreparedDevicesByClaimUID
.
// device was never prepared (assume that Prepare+Checkpoint are done | ||
// transactionally). Note that claimRef.String() contains namespace, | ||
// name, UID. | ||
klog.Infof("unprepare noop: claim not found in checkpoint data: %v", claimRef.String()) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I returned an error. Then I tested a fast creation/deletion cycle:
$ kubectl apply -f ./demo/specs/imex/channel-injection.yaml
followed a few seconds later by
$ kubectl delete -f ./demo/specs/imex/channel-injection.yaml
computedomain.resource.nvidia.com "imex-channel-injection" deleted
pod "imex-channel-injection" deleted
^C
The delete hang forever. ComputeDomain cleanup was stuck in
E0506 11:09:43.251572 1 workqueue.go:99] Failed to reconcile work item: error unpreparing devices for claim 'default/imex-c
hannel-injection-imex-channel-0-f2p49:cfcde21e-a350-496e-ba28-6a4b60aefc9c': unprepare failed: claim not found in checkpoint data
: default/imex-channel-injection-imex-channel-0-f2p49:cfcde21e-a350-496e-ba28-6a4b60aefc9c
I suppose I deleted objects so early that the node-local plugin never got to Prepare() the device.
In that state, the claim is expectedly not referred to in the checkpoint data.
Hence, this should not be treated as an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, this should be treated as an error.
Do you mean that this should NOT be treated as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that this should NOT be treated as an error?
Yes :) I live in an inverted world. Edited.
} | ||
|
||
if claim.Status.Allocation == nil { | ||
return true, fmt.Errorf("no allocation set in ResourceClaim %s in namespace %s", claim.Name, claim.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it possible for Unprepare to be invoked for a claim with claim.Status.Allocation == nil
?
As the patch currently stands, we only Unprepare actively when the claim has been previously checkpointed, and we only checkpoint a claim in Prepare, i.e. when Status.Allocation != nil
.
When the claim has not been checkpointed we don't error out on Unprepare but pretend success via noop-unprepare. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update, after confirming with Kevin: removal of the if claim.Status.Allocation == nil
check makes sense.
About the noop-unprepare I am rather certain that's what we want but I want to check again with Kevin too.
ec03632
to
9b7ac3c
Compare
@@ -12,14 +12,16 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedDevicesByClaimUID PreparedDevicesByClaimUID `json:"PreparedDevicesByClaimUID,omitempty"` | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: watch out, I propose to use the old term PreparedClaims
for something new (it is a great name for what I want to introduce here: a collection of resourceapi.ResourceClaim
objects).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This explains why the GPU plugin uses the NEW name -- since it doesn't have global claims like the compute domain driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple questions, LGTM
delete(preparedClaims, claimUID) | ||
|
||
if err := s.checkpointManager.CreateCheckpoint(DriverPluginCheckpointFile, checkpoint); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, should this be maybe called UpdateCheckpoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. However, we in fact create a new checkpoint here. A checkpoint is a self-contained snapshot of the entire state, atomically written.
That also means: persisting state after state mutation requires writing a new checkpoint.
The interface is given by https://github.com/kubernetes/kubernetes/blob/60a317eadfcb839692a68eab88b2096f4d708f4f/pkg/kubelet/checkpointmanager/checkpoint_manager.go :)
} | ||
preparedDevicesByClaimUID := checkpoint.V1.PreparedDevicesByClaimUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used much later on, does it make sense to move this further down, closer to when it's used?
err := s.cdi.DeleteClaimSpecFile(claimUID) | ||
if err != nil { | ||
return fmt.Errorf("unable to delete CDI spec file for claim: %w", err) | ||
} | ||
|
||
delete(preparedDevicesByClaimUID, claimUID) | ||
delete(preparedClaims, claimUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Are preparedDevicesByClaimUID
and preparedClaims
references to the maps in the checkpoint
object? Meaning, does deleting them here affect the checkpoint
object so that creating it below writes the mutated state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. In preparedClaims := checkpoint.V1.PreparedClaims
: is this a copy assignment operator (as we would call this in C++)? If it is then we mutate the copy and the mutation is not reflected in the checkpoint. I will test this, and also look this topic up in the Golang language spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test program:
package main
import (
"encoding/json"
"fmt"
)
type Checkpoint struct {
V1 *CheckpointV1
}
type CheckpointV1 struct {
SomeMap SomeMap
}
type SomeMap map[string]string
func main() {
m := SomeMap{
"eggs": "yes",
"bacon": "no",
}
c := Checkpoint{V1: &CheckpointV1{SomeMap: m}}
out, _ := json.Marshal(c)
fmt.Printf("%s\n", out)
m2 := c.V1.SomeMap
delete(m2, "eggs")
out2, _ := json.Marshal(c)
fmt.Printf("%s\n", out2)
}
}
Output:
$ go run assignment-test.go
{"V1":{"SomeMap":{"bacon":"no","eggs":"yes"}}}
{"V1":{"SomeMap":{"bacon":"no"}}}
The mutation does apply to the checkpoint.
Why does it apply? I will read more.
Edit: I think I now understood that Go will not implicitly make a copy of your map payload when you don't very explicitly want to do so.
I looked at https://stackoverflow.com/a/31173039/145400 and also https://medium.com/@shanechristian03/lets-talk-about-reference-types-in-go-b44cf4938e2d.
It should be obvious by now that setting b[“four”] = 4 within function modify will also affect the original map b, because a map is essentially a pointer under the hood.
After that, we make a copy of a and assign it to b. Although b is a copy, both a and b still point to the same underlying array.
You will not find the term ‘reference types’ in the official Go language specification. The term is not officially supported in Go, most likely because it can cause confusion. However, it’s still widely used within the Go community.
if err := s.checkpointManager.CreateCheckpoint(DriverPluginCheckpointFile, checkpoint); err != nil { | ||
return nil, fmt.Errorf("unable to sync to checkpoint: %v", err) | ||
} | ||
|
||
return preparedClaims[claimUID].GetDevices(), nil | ||
return preparedDevicesByClaimUID[claimUID].GetDevices(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this changed between where we set it on line 153 and here? Is returning preparedDevices.GetDevices()
clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My edit here was rather ignorant. I see. I think we can do what you propose. Will report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also OK to leave it as is if we feel it's out of scope for this PR.
@@ -153,42 +156,57 @@ func (s *DeviceState) Prepare(ctx context.Context, claim *resourceapi.ResourceCl | |||
return nil, fmt.Errorf("unable to create CDI spec file for claim: %w", err) | |||
} | |||
|
|||
preparedClaims[claimUID] = preparedDevices | |||
// Add ResourceClaim API object to checkpoint so that for Unprepare() only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there situations when we should also query the API server (e.g. if the local state is not sufficient?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder the same. That's certainly a key question here.
There should not be such situations. Conceptually, we have to build proper cleanup w/o API server input (input other than ResourceClaim UID). I have edited the PR description a bit.
if preparedDevicesByClaimUID[claimUID] != nil { | ||
return preparedDevicesByClaimUID[claimUID].GetDevices(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could also write this as:
if preparedDevicesByClaimUID[claimUID] != nil { | |
return preparedDevicesByClaimUID[claimUID].GetDevices(), nil | |
if preparedDevices, ok := preparedDevicesByClaimUID[claimUID]; ok { | |
return preparedDevices.GetDevices(), nil |
Not sure where I stand on readability here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also could be considered out of scope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment anyway. Made me read a bit about "nil maps" and the zero value. I like the current readability, but it's important to understand of course what the current zero value is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :) (we now return preparedDevices.GetDevices(), nil
)
|
||
// key: stringified claim UUID | ||
type PreparedDevicesByClaimUID map[string]PreparedDevices | ||
type PreparedClaims map[string]resourceapi.ResourceClaim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO after sync code review with Kevin: store only claim.Status
field, not entire claim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if preparedClaims[claimUID] != nil { | ||
return preparedClaims[claimUID].GetDevices(), nil | ||
// Has previously been prepared by us. | ||
// (Under which circumstances does this get called again?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: adjust comment (this may happen)
|
||
// Rely on local checkpoint state for ability to clean up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reliance on only checkpointing data: of course, checkpointing data is vulnerable to node failures (data may be lost in rare cases). But device state can be assumed to get lost / reset 'in sync' (e.g., as part of recovery from a failure affecting the checkpoint data it is likely for a reboot to happen).
I updated the patch after discussion / review feedback. I tested the current state on Starwars. A bit of log output from there:
|
@@ -115,13 +115,13 @@ func NewDeviceState(ctx context.Context, config *Config) (*DeviceState, error) { | |||
} | |||
|
|||
for _, c := range checkpoints { | |||
if c == DriverPluginCheckpointFile { | |||
if c == DriverPluginCheckpointFileBasename { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change; just a variable rename. Was looking for the checkpoint file path by following code. Renamed this because a hard-learned lesson for me is to use the term "file" only for abstract file objects, and never for basenames or absolute paths. In this case, we store a basename.
@@ -147,7 +146,7 @@ func (d *driver) PrepareResourceClaims(ctx context.Context, claims []*resourceap | |||
} | |||
|
|||
func (d *driver) UnprepareResourceClaims(ctx context.Context, claimRefs []kubeletplugin.NamespacedObject) (map[types.UID]error, error) { | |||
klog.Infof("UnprepareResourceClaims called with %d claim(s)", len(claimRefs)) | |||
klog.V(6).Infof("UnprepareResourceClaims called with %d claim(s)", len(claimRefs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we typically see pairs of these log messages:
I0513 12:35:44.198425 1 driver.go:149] UnprepareResourceClaims called with 1 claim(s)
I0513 12:35:44.204757 1 driver.go:210] unprepared devices for claim 'nvidia-dra-driver-gpu/imex-channel-injection-rmr8b-zh5nx-compute-domain-daemon-xhlq2:c515582f-eb9d-40c4-b6f3-3729110b4755'
Instead of removing the first msg, I for now just wanted to mark it as more 'debuggy' (lower severity).
return fmt.Errorf("unable to sync to checkpoint: %v", err) | ||
// Write new checkpoint reflecting that all devices for this claim have been | ||
// unprepared (by virtue of removing its UID from all mappings). | ||
preparedDevicesByClaimUID := checkpoint.V1.PreparedDevicesByClaimUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
@@ -12,14 +12,16 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedDevicesByClaimUID PreparedDevicesByClaimUID `json:"PreparedDevicesByClaimUID,omitempty"` | |||
PreparedClaimStates PreparedClaimStates `json:"PreparedClaimStates,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(from meeting with Kevin) maybe just call this Statuses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json property: camelCase 🐫
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json property: camelCase
done
status, found := preparedClaimStates[claimUID] | ||
if !found { | ||
// Not an error: if this claim UID is not in the checkpoint then this | ||
// device was never prepared (assume that Prepare+Checkpoint are done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or has already been unprepared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// device was never prepared (assume that Prepare+Checkpoint are done | ||
// transactionally). Note that claimRef.String() contains namespace, | ||
// name, UID. | ||
klog.Infof("unprepare noop: claim not found in checkpoint data: %v", claimRef.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review capitalization of log msgs
// unprepared (by virtue of removing its UID from all mappings). | ||
preparedDevicesByClaimUID := checkpoint.V1.PreparedDevicesByClaimUID | ||
delete(preparedDevicesByClaimUID, claimUID) | ||
delete(preparedClaimStates, claimUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead have just one top-level map, and every value in that map would be an object with a .Status
and a .PreparedDevices
property/field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/gpu-kubelet-plugin/checkpoint.go
Outdated
} | ||
|
||
func newCheckpoint() *Checkpoint { | ||
pc := &Checkpoint{ | ||
Checksum: 0, | ||
V1: &CheckpointV1{ | ||
PreparedClaims: make(PreparedClaims), | ||
PreparedDevicesByClaimUID: make(PreparedDevicesByClaimUID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the future: make gpu plugin and CD plugin code paths symmetric w.r.t. which data to checkpoint, and which structures to use during unprepare
todo: restructure the checkpoint, to have one high-level map indexed by UID |
355cea1
to
f5f77a9
Compare
@@ -12,14 +13,22 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedClaimsByUID `json:"preparedClaims,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json field name should match the go field name (modulo the casing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that. Of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Status resourceapi.ResourceClaimStatus | ||
PreparedDevices PreparedDevices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need json tags here to, otherwise the default to capitalized same as the go field name (this is true at every level of nesting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f5aa817
to
a025dd1
Compare
@@ -12,14 +13,22 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedClaimsByUID `json:"preparedClaimsByUID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I forgot a PreparedClaimsByUID
type/name definition here (depending on how you look at this). But this compiles and works. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I accidentally created an embedded field here.
A field declared with a type but no explicit field name is called an embedded field
The unqualified type name acts as the field name.
https://go.dev/ref/spec#Struct_types
This works, but we don't want this here.
a025dd1
to
a3391aa
Compare
Thanks for great feedback and discussion. I manually tested the last commit on this branch on Starwars (for both, GPUs and CDs, and involving prepare and unprepare). Log excerpt for CD:
(inconsistent capitalization of log msgs: will look into this in a separate patch). Also had a brief look at the JSON anatomy in the checkpoint file. Looks sane (for the record, two bad screenshots: |
@@ -12,14 +13,22 @@ type Checkpoint struct { | |||
} | |||
|
|||
type CheckpointV1 struct { | |||
PreparedClaims PreparedClaims `json:"preparedClaims,omitempty"` | |||
PreparedClaimsByUID PreparedClaimsByUID `json:"preparedClaimsByUID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the ByUID
in the field name (but keep it in the type)? If you feel strongly we can keep it, but it feels a bit awkward to have this detail exposed in the actual checkpointed file (where its contents will clearly be indexing here by uuid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done! I like the shorter name.
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
a3391aa
to
35a4158
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the back-on-forth on this.
Looking great now.
/lgtm
/approve
Generally -- as discussed with @klueska and @pohly -- resource cleanup (i.e., device unprepare) must not rely on ResourceClaim objects being available in the API server. Cleanup must built so that the only external input parameter is the ResourceClaim UID.
Any state that has to be maintained between Prepare and Unprepare (to ensure proper teardown) can of course be persisted node-locally. Checkpointing logic lends itself for maintaining such state.
Hence, this patch persists resource claim data upon Prepare() so that it can be re-discovered (by claim UID) during Unpreapre().
As such, this patch addresses #340.
Notably, we also don't want to fetch the claim object from the API server as an optimization or so. We have to figure out how to reliably do resource teardown using purely local state.
Initial testing looks good: