-
Notifications
You must be signed in to change notification settings - Fork 239
refactor: use metadata store to check for blob existence #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to drop the s3 blobstore based existence check function?
if s.blobStore.CheckBlobExists(ctx, blobKey) { | ||
|
||
// check if blob already exists | ||
_, err = s.blobMetadataStore.GetBlobMetadata(ctx, blobKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBlobMetadata
will actually fetch the data, we may make it more efficient by just checking the existence of keys without fetching data.
I think DynamoDB can do this by just projecting the pk
, something like
queryInput := &dynamodb.QueryInput{
TableName: aws.String(s.tableName),
KeyConditionExpression: ...
ExpressionAttributeValues: ...
Limit: aws.Int32(1), // Only check for existence
ProjectionExpression: aws.String("PK"), // Only fetch the key attribute
}
May need to add an interface to BlobMetadata store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I've added something like this!
return fmt.Errorf("blob already exists: %s", blobKey.Hex()) | ||
} | ||
|
||
// Check if the error is NOT "metadata not found" - which would be a real error | ||
if err != nil && !errors.Is(err, common.ErrMetadataNotFound) { | ||
return fmt.Errorf("failed to check blob existence: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it's not an InvalidArgument
error, but an Internal
error. Note all the errors from this validateDispersalRequest
are treated as InvalidArgument
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I moved the blob existence check out of validateDispersalRequest
, and typed the errors with more granularity. lmk if that looks okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that looks fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -265,6 +265,33 @@ func (s *BlobMetadataStore) GetBlobMetadata(ctx context.Context, blobKey corev2. | |||
return metadata, nil | |||
} | |||
|
|||
// DoesBlobExist checks if a blob exists without fetching the entire metadata. | |||
func (s *BlobMetadataStore) DoesBlobExist(ctx context.Context, blobKey corev2.BlobKey) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the previous name in s3 store may fit well CheckBlobExists
Limit: aws.Int32(1), | ||
} | ||
|
||
result, err := s.dynamoDBClient.QueryWithInput(ctx, queryInput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use GetItem
? Might be slightly more efficient than Query
// CheckBlobExists checks if a blob exists without fetching the entire metadata. | ||
func (s *BlobMetadataStore) CheckBlobExists(ctx context.Context, blobKey corev2.BlobKey) (bool, error) { | ||
// Use GetItem with ProjectionExpression to minimize data transfer | ||
item, err := s.dynamoDBClient.GetItem(ctx, s.tableName, map[string]types.AttributeValue{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use projection as well for GetItem I think: https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/dynamodb#GetItemInput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current dynamodb client interface doesn't support projection but I've added it:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! Can we make it GetItemWithInput
(similar to QueryWithInput
) and passing in GetItemInput
as param, so in the future if there are other needs to customize, it'll be easy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense! updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Why are these changes needed?
https://eigenlabs.slack.com/archives/C06JZQHN5R7/p1741641849473049
Checks